Simple trie class in C++
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
4
down vote
favorite
I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.
Some questions I have:
- Is my
create
member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator? - Is there a way that I could avoid using the virtual function
get_word
? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address wherethis->word
could reside.
General tips are appreciated, not just answers to these questions.
Thanks!
#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>
class TrieNode
public:
std::unordered_map<char, TrieNode*> children;
TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);
return *this;
virtual ~TrieNode()
for (auto const& it: children)
delete it.second;
void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;
;
class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();
;
void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);
class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());
for (auto const& c_it: node->children)
apply_all(c_it.second, function);
public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];
//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);
bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;
return node->children.find('') != node->children.end();
void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;
apply_all(node, function);
;
int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);
c++ c++11 polymorphism trie
add a comment |Â
up vote
4
down vote
favorite
I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.
Some questions I have:
- Is my
create
member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator? - Is there a way that I could avoid using the virtual function
get_word
? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address wherethis->word
could reside.
General tips are appreciated, not just answers to these questions.
Thanks!
#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>
class TrieNode
public:
std::unordered_map<char, TrieNode*> children;
TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);
return *this;
virtual ~TrieNode()
for (auto const& it: children)
delete it.second;
void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;
;
class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();
;
void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);
class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());
for (auto const& c_it: node->children)
apply_all(c_it.second, function);
public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];
//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);
bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;
return node->children.find('') != node->children.end();
void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;
apply_all(node, function);
;
int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);
c++ c++11 polymorphism trie
To answer the second question:static_cast<TerminalTrieNode*>(_)->word
works.
â Reid Hayes
Jul 17 at 2:32
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.
Some questions I have:
- Is my
create
member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator? - Is there a way that I could avoid using the virtual function
get_word
? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address wherethis->word
could reside.
General tips are appreciated, not just answers to these questions.
Thanks!
#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>
class TrieNode
public:
std::unordered_map<char, TrieNode*> children;
TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);
return *this;
virtual ~TrieNode()
for (auto const& it: children)
delete it.second;
void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;
;
class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();
;
void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);
class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());
for (auto const& c_it: node->children)
apply_all(c_it.second, function);
public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];
//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);
bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;
return node->children.find('') != node->children.end();
void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;
apply_all(node, function);
;
int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);
c++ c++11 polymorphism trie
I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.
Some questions I have:
- Is my
create
member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator? - Is there a way that I could avoid using the virtual function
get_word
? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address wherethis->word
could reside.
General tips are appreciated, not just answers to these questions.
Thanks!
#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>
class TrieNode
public:
std::unordered_map<char, TrieNode*> children;
TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);
return *this;
virtual ~TrieNode()
for (auto const& it: children)
delete it.second;
void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;
;
class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();
;
void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);
class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());
for (auto const& c_it: node->children)
apply_all(c_it.second, function);
public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];
//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);
bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;
return node->children.find('') != node->children.end();
void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;
apply_all(node, function);
;
int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);
c++ c++11 polymorphism trie
asked Jul 17 at 1:56
Reid Hayes
233
233
To answer the second question:static_cast<TerminalTrieNode*>(_)->word
works.
â Reid Hayes
Jul 17 at 2:32
add a comment |Â
To answer the second question:static_cast<TerminalTrieNode*>(_)->word
works.
â Reid Hayes
Jul 17 at 2:32
To answer the second question:
static_cast<TerminalTrieNode*>(_)->word
works.â Reid Hayes
Jul 17 at 2:32
To answer the second question:
static_cast<TerminalTrieNode*>(_)->word
works.â Reid Hayes
Jul 17 at 2:32
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
Design
I don't think that TerminalTrieNode
is that good of an idea, for multiple reasons:
You effectively cannot insert byte-strings that contain the
char
value''
. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.Having to use pointers in
TrieNode::children
(due to inheritance) meansdoublingincreasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in thestd::unordered_map
. This makes all operations slower due to cache misses.The whole
get_word
"hack" smells (as you noticed).
Also, it would be nice to retrieve information about which path a node is on.
Ideally, I'd like something along these lines:
class TrieNode
// Note: store TrieNode by value!
std::unordered_map<char, TrieNode> children;
// simple indicator if this is a leaf node (end of a word)
bool is_leaf_node;
// A reference to the path so far
// Note: if memory usage gets too big, this could be replaced by a std::string_view
// and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
// a part of the stored string.
std::string path;
public:
TrieNode(std::string);
std::string_view word() const;
void make_leaf();
bool is_leaf() const;
;
Sadly, this won't compile, as
std::unordered_map
doesn't support incomplete types (TrieNode
is incomplete at the declaration ofTrieNode::children
). One easy solution would be using another hash map type that supports incomplete types, e.g.boost::unordered_map
.
But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.
Other issues
Trie::prefix_apply
only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)
) or possibly member functions (including the corresponding object). This can be fixed in two ways:
Take a
std::function<void(std::string const&)>
as parameter. This supports all of the above, but might incur a heap allocation upon construction.Template it on a
Callable&&
which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).
A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.
add a comment |Â
up vote
2
down vote
Looking at this code, I have several remarks, let me start with your questions:
Is my create member function idiomatic?
No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.
In this case, it looks better to have a virtual method clone
on the nose or make the structure templated (preferred by me).
Is there a way that I could avoid using the virtual function get_word?
In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
For this specific case, I realize this ain't possible, however, it does make sense having only one node.
Ownership
You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.
Relevance of member functions
prefix_apply
is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?
At the same time, it is very specific, so you could replace it by the visitor pattern.
Rule of five
Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.
Public members
Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.
Purpose and performance
It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set
or just a vector on which to unique/erase.
Both variants will be more memory efficient and most like more performant.
Atrie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
â 1201ProgramAlarm
Jul 17 at 17:44
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
Thanks for your response! Are there really memory leaks? Doesn't~TrieNode()
recursively delete all of the nodes children before deleting itself?
â Reid Hayes
Jul 17 at 18:12
1
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Design
I don't think that TerminalTrieNode
is that good of an idea, for multiple reasons:
You effectively cannot insert byte-strings that contain the
char
value''
. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.Having to use pointers in
TrieNode::children
(due to inheritance) meansdoublingincreasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in thestd::unordered_map
. This makes all operations slower due to cache misses.The whole
get_word
"hack" smells (as you noticed).
Also, it would be nice to retrieve information about which path a node is on.
Ideally, I'd like something along these lines:
class TrieNode
// Note: store TrieNode by value!
std::unordered_map<char, TrieNode> children;
// simple indicator if this is a leaf node (end of a word)
bool is_leaf_node;
// A reference to the path so far
// Note: if memory usage gets too big, this could be replaced by a std::string_view
// and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
// a part of the stored string.
std::string path;
public:
TrieNode(std::string);
std::string_view word() const;
void make_leaf();
bool is_leaf() const;
;
Sadly, this won't compile, as
std::unordered_map
doesn't support incomplete types (TrieNode
is incomplete at the declaration ofTrieNode::children
). One easy solution would be using another hash map type that supports incomplete types, e.g.boost::unordered_map
.
But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.
Other issues
Trie::prefix_apply
only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)
) or possibly member functions (including the corresponding object). This can be fixed in two ways:
Take a
std::function<void(std::string const&)>
as parameter. This supports all of the above, but might incur a heap allocation upon construction.Template it on a
Callable&&
which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).
A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.
add a comment |Â
up vote
2
down vote
accepted
Design
I don't think that TerminalTrieNode
is that good of an idea, for multiple reasons:
You effectively cannot insert byte-strings that contain the
char
value''
. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.Having to use pointers in
TrieNode::children
(due to inheritance) meansdoublingincreasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in thestd::unordered_map
. This makes all operations slower due to cache misses.The whole
get_word
"hack" smells (as you noticed).
Also, it would be nice to retrieve information about which path a node is on.
Ideally, I'd like something along these lines:
class TrieNode
// Note: store TrieNode by value!
std::unordered_map<char, TrieNode> children;
// simple indicator if this is a leaf node (end of a word)
bool is_leaf_node;
// A reference to the path so far
// Note: if memory usage gets too big, this could be replaced by a std::string_view
// and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
// a part of the stored string.
std::string path;
public:
TrieNode(std::string);
std::string_view word() const;
void make_leaf();
bool is_leaf() const;
;
Sadly, this won't compile, as
std::unordered_map
doesn't support incomplete types (TrieNode
is incomplete at the declaration ofTrieNode::children
). One easy solution would be using another hash map type that supports incomplete types, e.g.boost::unordered_map
.
But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.
Other issues
Trie::prefix_apply
only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)
) or possibly member functions (including the corresponding object). This can be fixed in two ways:
Take a
std::function<void(std::string const&)>
as parameter. This supports all of the above, but might incur a heap allocation upon construction.Template it on a
Callable&&
which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).
A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Design
I don't think that TerminalTrieNode
is that good of an idea, for multiple reasons:
You effectively cannot insert byte-strings that contain the
char
value''
. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.Having to use pointers in
TrieNode::children
(due to inheritance) meansdoublingincreasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in thestd::unordered_map
. This makes all operations slower due to cache misses.The whole
get_word
"hack" smells (as you noticed).
Also, it would be nice to retrieve information about which path a node is on.
Ideally, I'd like something along these lines:
class TrieNode
// Note: store TrieNode by value!
std::unordered_map<char, TrieNode> children;
// simple indicator if this is a leaf node (end of a word)
bool is_leaf_node;
// A reference to the path so far
// Note: if memory usage gets too big, this could be replaced by a std::string_view
// and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
// a part of the stored string.
std::string path;
public:
TrieNode(std::string);
std::string_view word() const;
void make_leaf();
bool is_leaf() const;
;
Sadly, this won't compile, as
std::unordered_map
doesn't support incomplete types (TrieNode
is incomplete at the declaration ofTrieNode::children
). One easy solution would be using another hash map type that supports incomplete types, e.g.boost::unordered_map
.
But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.
Other issues
Trie::prefix_apply
only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)
) or possibly member functions (including the corresponding object). This can be fixed in two ways:
Take a
std::function<void(std::string const&)>
as parameter. This supports all of the above, but might incur a heap allocation upon construction.Template it on a
Callable&&
which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).
A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.
Design
I don't think that TerminalTrieNode
is that good of an idea, for multiple reasons:
You effectively cannot insert byte-strings that contain the
char
value''
. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.Having to use pointers in
TrieNode::children
(due to inheritance) meansdoublingincreasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in thestd::unordered_map
. This makes all operations slower due to cache misses.The whole
get_word
"hack" smells (as you noticed).
Also, it would be nice to retrieve information about which path a node is on.
Ideally, I'd like something along these lines:
class TrieNode
// Note: store TrieNode by value!
std::unordered_map<char, TrieNode> children;
// simple indicator if this is a leaf node (end of a word)
bool is_leaf_node;
// A reference to the path so far
// Note: if memory usage gets too big, this could be replaced by a std::string_view
// and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
// a part of the stored string.
std::string path;
public:
TrieNode(std::string);
std::string_view word() const;
void make_leaf();
bool is_leaf() const;
;
Sadly, this won't compile, as
std::unordered_map
doesn't support incomplete types (TrieNode
is incomplete at the declaration ofTrieNode::children
). One easy solution would be using another hash map type that supports incomplete types, e.g.boost::unordered_map
.
But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.
Other issues
Trie::prefix_apply
only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)
) or possibly member functions (including the corresponding object). This can be fixed in two ways:
Take a
std::function<void(std::string const&)>
as parameter. This supports all of the above, but might incur a heap allocation upon construction.Template it on a
Callable&&
which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).
A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.
answered Jul 19 at 16:06
hoffmale
4,205630
4,205630
add a comment |Â
add a comment |Â
up vote
2
down vote
Looking at this code, I have several remarks, let me start with your questions:
Is my create member function idiomatic?
No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.
In this case, it looks better to have a virtual method clone
on the nose or make the structure templated (preferred by me).
Is there a way that I could avoid using the virtual function get_word?
In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
For this specific case, I realize this ain't possible, however, it does make sense having only one node.
Ownership
You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.
Relevance of member functions
prefix_apply
is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?
At the same time, it is very specific, so you could replace it by the visitor pattern.
Rule of five
Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.
Public members
Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.
Purpose and performance
It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set
or just a vector on which to unique/erase.
Both variants will be more memory efficient and most like more performant.
Atrie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
â 1201ProgramAlarm
Jul 17 at 17:44
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
Thanks for your response! Are there really memory leaks? Doesn't~TrieNode()
recursively delete all of the nodes children before deleting itself?
â Reid Hayes
Jul 17 at 18:12
1
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
add a comment |Â
up vote
2
down vote
Looking at this code, I have several remarks, let me start with your questions:
Is my create member function idiomatic?
No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.
In this case, it looks better to have a virtual method clone
on the nose or make the structure templated (preferred by me).
Is there a way that I could avoid using the virtual function get_word?
In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
For this specific case, I realize this ain't possible, however, it does make sense having only one node.
Ownership
You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.
Relevance of member functions
prefix_apply
is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?
At the same time, it is very specific, so you could replace it by the visitor pattern.
Rule of five
Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.
Public members
Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.
Purpose and performance
It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set
or just a vector on which to unique/erase.
Both variants will be more memory efficient and most like more performant.
Atrie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
â 1201ProgramAlarm
Jul 17 at 17:44
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
Thanks for your response! Are there really memory leaks? Doesn't~TrieNode()
recursively delete all of the nodes children before deleting itself?
â Reid Hayes
Jul 17 at 18:12
1
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Looking at this code, I have several remarks, let me start with your questions:
Is my create member function idiomatic?
No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.
In this case, it looks better to have a virtual method clone
on the nose or make the structure templated (preferred by me).
Is there a way that I could avoid using the virtual function get_word?
In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
For this specific case, I realize this ain't possible, however, it does make sense having only one node.
Ownership
You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.
Relevance of member functions
prefix_apply
is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?
At the same time, it is very specific, so you could replace it by the visitor pattern.
Rule of five
Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.
Public members
Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.
Purpose and performance
It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set
or just a vector on which to unique/erase.
Both variants will be more memory efficient and most like more performant.
Looking at this code, I have several remarks, let me start with your questions:
Is my create member function idiomatic?
No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.
In this case, it looks better to have a virtual method clone
on the nose or make the structure templated (preferred by me).
Is there a way that I could avoid using the virtual function get_word?
In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
For this specific case, I realize this ain't possible, however, it does make sense having only one node.
Ownership
You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.
Relevance of member functions
prefix_apply
is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?
At the same time, it is very specific, so you could replace it by the visitor pattern.
Rule of five
Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.
Public members
Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.
Purpose and performance
It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set
or just a vector on which to unique/erase.
Both variants will be more memory efficient and most like more performant.
edited Jul 17 at 17:53
answered Jul 17 at 16:14
JVApen
516212
516212
Atrie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
â 1201ProgramAlarm
Jul 17 at 17:44
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
Thanks for your response! Are there really memory leaks? Doesn't~TrieNode()
recursively delete all of the nodes children before deleting itself?
â Reid Hayes
Jul 17 at 18:12
1
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
add a comment |Â
Atrie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
â 1201ProgramAlarm
Jul 17 at 17:44
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
Thanks for your response! Are there really memory leaks? Doesn't~TrieNode()
recursively delete all of the nodes children before deleting itself?
â Reid Hayes
Jul 17 at 18:12
1
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
A
trie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.â 1201ProgramAlarm
Jul 17 at 17:44
A
trie
is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.â 1201ProgramAlarm
Jul 17 at 17:44
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
I stand corrected, let me remove
â JVApen
Jul 17 at 17:52
Thanks for your response! Are there really memory leaks? Doesn't
~TrieNode()
recursively delete all of the nodes children before deleting itself?â Reid Hayes
Jul 17 at 18:12
Thanks for your response! Are there really memory leaks? Doesn't
~TrieNode()
recursively delete all of the nodes children before deleting itself?â Reid Hayes
Jul 17 at 18:12
1
1
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Operator= doesn't delete the nodes
â JVApen
Jul 17 at 18:13
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
â Reid Hayes
Jul 17 at 18:16
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199643%2fsimple-trie-class-in-c%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
To answer the second question:
static_cast<TerminalTrieNode*>(_)->word
works.â Reid Hayes
Jul 17 at 2:32