Simple trie class in C++

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.



Some questions I have:



  • Is my create member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator?

  • Is there a way that I could avoid using the virtual function get_word? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address where this->word could reside.

General tips are appreciated, not just answers to these questions.



Thanks!



#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>

class TrieNode
public:
std::unordered_map<char, TrieNode*> children;

TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);

return *this;

virtual ~TrieNode()
for (auto const& it: children)
delete it.second;


void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;

;

class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();

;

void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);



class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());

for (auto const& c_it: node->children)
apply_all(c_it.second, function);

public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];

//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);


bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;

return node->children.find('') != node->children.end();


void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;

apply_all(node, function);

;

int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);







share|improve this question



















  • To answer the second question: static_cast<TerminalTrieNode*>(_)->word works.
    – Reid Hayes
    Jul 17 at 2:32

















up vote
4
down vote

favorite












I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.



Some questions I have:



  • Is my create member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator?

  • Is there a way that I could avoid using the virtual function get_word? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address where this->word could reside.

General tips are appreciated, not just answers to these questions.



Thanks!



#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>

class TrieNode
public:
std::unordered_map<char, TrieNode*> children;

TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);

return *this;

virtual ~TrieNode()
for (auto const& it: children)
delete it.second;


void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;

;

class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();

;

void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);



class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());

for (auto const& c_it: node->children)
apply_all(c_it.second, function);

public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];

//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);


bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;

return node->children.find('') != node->children.end();


void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;

apply_all(node, function);

;

int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);







share|improve this question



















  • To answer the second question: static_cast<TerminalTrieNode*>(_)->word works.
    – Reid Hayes
    Jul 17 at 2:32













up vote
4
down vote

favorite









up vote
4
down vote

favorite











I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.



Some questions I have:



  • Is my create member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator?

  • Is there a way that I could avoid using the virtual function get_word? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address where this->word could reside.

General tips are appreciated, not just answers to these questions.



Thanks!



#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>

class TrieNode
public:
std::unordered_map<char, TrieNode*> children;

TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);

return *this;

virtual ~TrieNode()
for (auto const& it: children)
delete it.second;


void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;

;

class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();

;

void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);



class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());

for (auto const& c_it: node->children)
apply_all(c_it.second, function);

public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];

//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);


bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;

return node->children.find('') != node->children.end();


void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;

apply_all(node, function);

;

int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);







share|improve this question











I would be very grateful to get some thoughts on this toy-implementation of a trie that I wrote to practice my C++.



Some questions I have:



  • Is my create member function idiomatic? What's the standard way avoid duplicated copy in the copy-constructor and assignment operator?

  • Is there a way that I could avoid using the virtual function get_word? It seems safer to use a virtual function (to avoid accessing undefined memory), but it I don't see why I shouldn't be able to lookup the address where this->word could reside.

General tips are appreciated, not just answers to these questions.



Thanks!



#include <unordered_map>
#include <string>
#include <iostream>
#include <stdexcept>

class TrieNode
public:
std::unordered_map<char, TrieNode*> children;

TrieNode()
TrieNode(TrieNode const& trie_node) create(trie_node); ;
TrieNode& operator=(TrieNode const& rhs)
if (&rhs != this)
children.clear();
create(rhs);

return *this;

virtual ~TrieNode()
for (auto const& it: children)
delete it.second;


void create(TrieNode const&);
virtual std::string const& get_word() const throw std::domain_error("get_word() called on non-terminal node."); ;

;

class TerminalTrieNode: public TrieNode
// terminal nodes represent full words
// they are pointed to by edges with key ''
public:
std::string word;
std::string const& get_word() const return word;
TerminalTrieNode(std::string word): word(word)
TerminalTrieNode(TrieNode const* terminal_trie_node)
create(*terminal_trie_node);
word = terminal_trie_node->get_word();

;

void TrieNode::create(TrieNode const& trie_node)
for (auto const& it: trie_node.children)
if (it.first != '')
children[it.first] = new TrieNode(*it.second);
else
children[it.first] = new TerminalTrieNode(it.second);



class Trie
private:
TrieNode root;
void apply_all(TrieNode const* node, void (function(std::string const&))) const
// helper function for prefix_apply
// applies the given function to every word in the trie below the given word
auto it = node->children.find('');
if (it != node->children.end())
function(it->second->get_word());

for (auto const& c_it: node->children)
apply_all(c_it.second, function);

public:
void insert(std::string const& word)
// add a new word to the trie
TrieNode* node = &root;
for (char const c: word)
if (!node->children[c])
node->children[c] = new TrieNode();
node = node->children[c];

//mark as a terminal node
node->children[''] = new TerminalTrieNode(word);


bool contains(std::string const& word) const
// set-membership test
TrieNode const* node = &root;
for (char const c: word)
auto it = node->children.find(c);
if (it == node->children.end()) return false;
node = it->second;

return node->children.find('') != node->children.end();


void prefix_apply(std::string const& prefix, void (function(std::string const&))) const
// Apply the given function to every word in the trie that has the given prefix
TrieNode const* node = &root;
for (char const c: prefix)
auto it = node->children.find(c);
if (it == node->children.end()) return;
node = it->second;

apply_all(node, function);

;

int main(int argc, char** argv)
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apl");
trie.insert("back");
trie.insert("appl");
Trie new_trie = trie;
new_trie.prefix_apply("app", (std::string const& s) std::cout << s << 'n';);









share|improve this question










share|improve this question




share|improve this question









asked Jul 17 at 1:56









Reid Hayes

233




233











  • To answer the second question: static_cast<TerminalTrieNode*>(_)->word works.
    – Reid Hayes
    Jul 17 at 2:32

















  • To answer the second question: static_cast<TerminalTrieNode*>(_)->word works.
    – Reid Hayes
    Jul 17 at 2:32
















To answer the second question: static_cast<TerminalTrieNode*>(_)->word works.
– Reid Hayes
Jul 17 at 2:32





To answer the second question: static_cast<TerminalTrieNode*>(_)->word works.
– Reid Hayes
Jul 17 at 2:32











2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










Design



I don't think that TerminalTrieNode is that good of an idea, for multiple reasons:



  • You effectively cannot insert byte-strings that contain the char value ''. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.


  • Having to use pointers in TrieNode::children (due to inheritance) means doubling increasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in the std::unordered_map. This makes all operations slower due to cache misses.


  • The whole get_word "hack" smells (as you noticed).


Also, it would be nice to retrieve information about which path a node is on.



Ideally, I'd like something along these lines:



class TrieNode 
// Note: store TrieNode by value!
std::unordered_map<char, TrieNode> children;

// simple indicator if this is a leaf node (end of a word)
bool is_leaf_node;

// A reference to the path so far
// Note: if memory usage gets too big, this could be replaced by a std::string_view
// and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
// a part of the stored string.
std::string path;

public:
TrieNode(std::string);
std::string_view word() const;
void make_leaf();
bool is_leaf() const;
;



Sadly, this won't compile, as std::unordered_map doesn't support incomplete types (TrieNode is incomplete at the declaration of TrieNode::children). One easy solution would be using another hash map type that supports incomplete types, e.g. boost::unordered_map.



But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.




Other issues



Trie::prefix_apply only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)) or possibly member functions (including the corresponding object). This can be fixed in two ways:



  1. Take a std::function<void(std::string const&)> as parameter. This supports all of the above, but might incur a heap allocation upon construction.


  2. Template it on a Callable&& which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).



A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.







share|improve this answer




























    up vote
    2
    down vote













    Looking at this code, I have several remarks, let me start with your questions:



    Is my create member function idiomatic?



    No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.



    In this case, it looks better to have a virtual method clone on the nose or make the structure templated (preferred by me).



    Is there a way that I could avoid using the virtual function get_word?



    In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
    For this specific case, I realize this ain't possible, however, it does make sense having only one node.



    Ownership



    You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.



    Relevance of member functions



    prefix_apply is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?



    At the same time, it is very specific, so you could replace it by the visitor pattern.



    Rule of five



    Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.



    Public members



    Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.



    Purpose and performance



    It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set or just a vector on which to unique/erase.
    Both variants will be more memory efficient and most like more performant.






    share|improve this answer























    • A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
      – 1201ProgramAlarm
      Jul 17 at 17:44










    • I stand corrected, let me remove
      – JVApen
      Jul 17 at 17:52










    • Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
      – Reid Hayes
      Jul 17 at 18:12






    • 1




      Operator= doesn't delete the nodes
      – JVApen
      Jul 17 at 18:13










    • Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
      – Reid Hayes
      Jul 17 at 18:16










    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199643%2fsimple-trie-class-in-c%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    Design



    I don't think that TerminalTrieNode is that good of an idea, for multiple reasons:



    • You effectively cannot insert byte-strings that contain the char value ''. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.


    • Having to use pointers in TrieNode::children (due to inheritance) means doubling increasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in the std::unordered_map. This makes all operations slower due to cache misses.


    • The whole get_word "hack" smells (as you noticed).


    Also, it would be nice to retrieve information about which path a node is on.



    Ideally, I'd like something along these lines:



    class TrieNode 
    // Note: store TrieNode by value!
    std::unordered_map<char, TrieNode> children;

    // simple indicator if this is a leaf node (end of a word)
    bool is_leaf_node;

    // A reference to the path so far
    // Note: if memory usage gets too big, this could be replaced by a std::string_view
    // and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
    // a part of the stored string.
    std::string path;

    public:
    TrieNode(std::string);
    std::string_view word() const;
    void make_leaf();
    bool is_leaf() const;
    ;



    Sadly, this won't compile, as std::unordered_map doesn't support incomplete types (TrieNode is incomplete at the declaration of TrieNode::children). One easy solution would be using another hash map type that supports incomplete types, e.g. boost::unordered_map.



    But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.




    Other issues



    Trie::prefix_apply only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)) or possibly member functions (including the corresponding object). This can be fixed in two ways:



    1. Take a std::function<void(std::string const&)> as parameter. This supports all of the above, but might incur a heap allocation upon construction.


    2. Template it on a Callable&& which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).



    A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.







    share|improve this answer

























      up vote
      2
      down vote



      accepted










      Design



      I don't think that TerminalTrieNode is that good of an idea, for multiple reasons:



      • You effectively cannot insert byte-strings that contain the char value ''. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.


      • Having to use pointers in TrieNode::children (due to inheritance) means doubling increasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in the std::unordered_map. This makes all operations slower due to cache misses.


      • The whole get_word "hack" smells (as you noticed).


      Also, it would be nice to retrieve information about which path a node is on.



      Ideally, I'd like something along these lines:



      class TrieNode 
      // Note: store TrieNode by value!
      std::unordered_map<char, TrieNode> children;

      // simple indicator if this is a leaf node (end of a word)
      bool is_leaf_node;

      // A reference to the path so far
      // Note: if memory usage gets too big, this could be replaced by a std::string_view
      // and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
      // a part of the stored string.
      std::string path;

      public:
      TrieNode(std::string);
      std::string_view word() const;
      void make_leaf();
      bool is_leaf() const;
      ;



      Sadly, this won't compile, as std::unordered_map doesn't support incomplete types (TrieNode is incomplete at the declaration of TrieNode::children). One easy solution would be using another hash map type that supports incomplete types, e.g. boost::unordered_map.



      But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.




      Other issues



      Trie::prefix_apply only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)) or possibly member functions (including the corresponding object). This can be fixed in two ways:



      1. Take a std::function<void(std::string const&)> as parameter. This supports all of the above, but might incur a heap allocation upon construction.


      2. Template it on a Callable&& which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).



      A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.







      share|improve this answer























        up vote
        2
        down vote



        accepted







        up vote
        2
        down vote



        accepted






        Design



        I don't think that TerminalTrieNode is that good of an idea, for multiple reasons:



        • You effectively cannot insert byte-strings that contain the char value ''. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.


        • Having to use pointers in TrieNode::children (due to inheritance) means doubling increasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in the std::unordered_map. This makes all operations slower due to cache misses.


        • The whole get_word "hack" smells (as you noticed).


        Also, it would be nice to retrieve information about which path a node is on.



        Ideally, I'd like something along these lines:



        class TrieNode 
        // Note: store TrieNode by value!
        std::unordered_map<char, TrieNode> children;

        // simple indicator if this is a leaf node (end of a word)
        bool is_leaf_node;

        // A reference to the path so far
        // Note: if memory usage gets too big, this could be replaced by a std::string_view
        // and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
        // a part of the stored string.
        std::string path;

        public:
        TrieNode(std::string);
        std::string_view word() const;
        void make_leaf();
        bool is_leaf() const;
        ;



        Sadly, this won't compile, as std::unordered_map doesn't support incomplete types (TrieNode is incomplete at the declaration of TrieNode::children). One easy solution would be using another hash map type that supports incomplete types, e.g. boost::unordered_map.



        But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.




        Other issues



        Trie::prefix_apply only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)) or possibly member functions (including the corresponding object). This can be fixed in two ways:



        1. Take a std::function<void(std::string const&)> as parameter. This supports all of the above, but might incur a heap allocation upon construction.


        2. Template it on a Callable&& which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).



        A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.







        share|improve this answer













        Design



        I don't think that TerminalTrieNode is that good of an idea, for multiple reasons:



        • You effectively cannot insert byte-strings that contain the char value ''. While this isn't likely to come up in purely ASCII based applications, it might e.g. for variants of unicode or handling of raw byte strings.


        • Having to use pointers in TrieNode::children (due to inheritance) means doubling increasing the amount of pointer dereferences: One for the actual pointer, and at least one hidden in the std::unordered_map. This makes all operations slower due to cache misses.


        • The whole get_word "hack" smells (as you noticed).


        Also, it would be nice to retrieve information about which path a node is on.



        Ideally, I'd like something along these lines:



        class TrieNode 
        // Note: store TrieNode by value!
        std::unordered_map<char, TrieNode> children;

        // simple indicator if this is a leaf node (end of a word)
        bool is_leaf_node;

        // A reference to the path so far
        // Note: if memory usage gets too big, this could be replaced by a std::string_view
        // and storing the leaf strings in a dedicated pool. Non-leaf nodes would only refer to
        // a part of the stored string.
        std::string path;

        public:
        TrieNode(std::string);
        std::string_view word() const;
        void make_leaf();
        bool is_leaf() const;
        ;



        Sadly, this won't compile, as std::unordered_map doesn't support incomplete types (TrieNode is incomplete at the declaration of TrieNode::children). One easy solution would be using another hash map type that supports incomplete types, e.g. boost::unordered_map.



        But: This solution sidesteps a lot (if not all) of the problems mentioned in @JVApen's answer.




        Other issues



        Trie::prefix_apply only takes non-member functions, i.e. no function objects (like capturing lambdas or instance of classes with overloaded operator(std::string const&)) or possibly member functions (including the corresponding object). This can be fixed in two ways:



        1. Take a std::function<void(std::string const&)> as parameter. This supports all of the above, but might incur a heap allocation upon construction.


        2. Template it on a Callable&& which gets deduced for the passed function. This doesn't incur a heap allocation, but it also doesn't support the (admittedly rare case) usage of pointer to member functions. It also might cause some binary bloat, as a new function is generated for each instantiation (though they might provide more optimization opportunities, e.g. inlining).



        A different design choice would be providing an iterator interface, though implementation of that is a bit more work. The advantage of that would be better compatibility with standard algorithms.








        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Jul 19 at 16:06









        hoffmale

        4,205630




        4,205630






















            up vote
            2
            down vote













            Looking at this code, I have several remarks, let me start with your questions:



            Is my create member function idiomatic?



            No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.



            In this case, it looks better to have a virtual method clone on the nose or make the structure templated (preferred by me).



            Is there a way that I could avoid using the virtual function get_word?



            In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
            For this specific case, I realize this ain't possible, however, it does make sense having only one node.



            Ownership



            You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.



            Relevance of member functions



            prefix_apply is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?



            At the same time, it is very specific, so you could replace it by the visitor pattern.



            Rule of five



            Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.



            Public members



            Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.



            Purpose and performance



            It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set or just a vector on which to unique/erase.
            Both variants will be more memory efficient and most like more performant.






            share|improve this answer























            • A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
              – 1201ProgramAlarm
              Jul 17 at 17:44










            • I stand corrected, let me remove
              – JVApen
              Jul 17 at 17:52










            • Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
              – Reid Hayes
              Jul 17 at 18:12






            • 1




              Operator= doesn't delete the nodes
              – JVApen
              Jul 17 at 18:13










            • Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
              – Reid Hayes
              Jul 17 at 18:16














            up vote
            2
            down vote













            Looking at this code, I have several remarks, let me start with your questions:



            Is my create member function idiomatic?



            No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.



            In this case, it looks better to have a virtual method clone on the nose or make the structure templated (preferred by me).



            Is there a way that I could avoid using the virtual function get_word?



            In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
            For this specific case, I realize this ain't possible, however, it does make sense having only one node.



            Ownership



            You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.



            Relevance of member functions



            prefix_apply is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?



            At the same time, it is very specific, so you could replace it by the visitor pattern.



            Rule of five



            Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.



            Public members



            Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.



            Purpose and performance



            It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set or just a vector on which to unique/erase.
            Both variants will be more memory efficient and most like more performant.






            share|improve this answer























            • A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
              – 1201ProgramAlarm
              Jul 17 at 17:44










            • I stand corrected, let me remove
              – JVApen
              Jul 17 at 17:52










            • Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
              – Reid Hayes
              Jul 17 at 18:12






            • 1




              Operator= doesn't delete the nodes
              – JVApen
              Jul 17 at 18:13










            • Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
              – Reid Hayes
              Jul 17 at 18:16












            up vote
            2
            down vote










            up vote
            2
            down vote









            Looking at this code, I have several remarks, let me start with your questions:



            Is my create member function idiomatic?



            No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.



            In this case, it looks better to have a virtual method clone on the nose or make the structure templated (preferred by me).



            Is there a way that I could avoid using the virtual function get_word?



            In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
            For this specific case, I realize this ain't possible, however, it does make sense having only one node.



            Ownership



            You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.



            Relevance of member functions



            prefix_apply is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?



            At the same time, it is very specific, so you could replace it by the visitor pattern.



            Rule of five



            Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.



            Public members



            Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.



            Purpose and performance



            It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set or just a vector on which to unique/erase.
            Both variants will be more memory efficient and most like more performant.






            share|improve this answer















            Looking at this code, I have several remarks, let me start with your questions:



            Is my create member function idiomatic?



            No, I barely see it. Mostly because it doesn't add value. Either classes are small enough so it doesn't matter, or copy/assign ain't supported, or it can be defaulted.



            In this case, it looks better to have a virtual method clone on the nose or make the structure templated (preferred by me).



            Is there a way that I could avoid using the virtual function get_word?



            In general I would make the node type would be a template argument, yes. Otherwise, you could cast (similar to CRTP), however that requires knowledge of the available types.
            For this specific case, I realize this ain't possible, however, it does make sense having only one node.



            Ownership



            You have some memory leaks in your code. I suggest to use a simple rule of thumb: always use unique_ptr instead of new/delete. Or when possible, just store by value in the map.



            Relevance of member functions



            prefix_apply is a very good example of a function that should be a free function. You make a copy and adapt the whole structure. Why not build up a new map?



            At the same time, it is very specific, so you could replace it by the visitor pattern.



            Rule of five



            Applying the rule of five allows you to move instead of copying. Most of the time, I even write move operations with the copy variant deleted.



            Public members



            Just don't, this is very hard to maintain. Especially when you encounter a specific implementation is buggy.



            Purpose and performance



            It looks like you are trying to make an ever growing string pool in which you want to share strings. How about using std::unordered_set or just a vector on which to unique/erase.
            Both variants will be more memory efficient and most like more performant.







            share|improve this answer















            share|improve this answer



            share|improve this answer








            edited Jul 17 at 17:53


























            answered Jul 17 at 16:14









            JVApen

            516212




            516212











            • A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
              – 1201ProgramAlarm
              Jul 17 at 17:44










            • I stand corrected, let me remove
              – JVApen
              Jul 17 at 17:52










            • Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
              – Reid Hayes
              Jul 17 at 18:12






            • 1




              Operator= doesn't delete the nodes
              – JVApen
              Jul 17 at 18:13










            • Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
              – Reid Hayes
              Jul 17 at 18:16
















            • A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
              – 1201ProgramAlarm
              Jul 17 at 17:44










            • I stand corrected, let me remove
              – JVApen
              Jul 17 at 17:52










            • Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
              – Reid Hayes
              Jul 17 at 18:12






            • 1




              Operator= doesn't delete the nodes
              – JVApen
              Jul 17 at 18:13










            • Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
              – Reid Hayes
              Jul 17 at 18:16















            A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
            – 1201ProgramAlarm
            Jul 17 at 17:44




            A trie is a specific data structure, used to hold an associative array. There's a specific tag for it on Stack Overflow. It's common to spell out "five" in "Rule of Five" and not use the digit.
            – 1201ProgramAlarm
            Jul 17 at 17:44












            I stand corrected, let me remove
            – JVApen
            Jul 17 at 17:52




            I stand corrected, let me remove
            – JVApen
            Jul 17 at 17:52












            Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
            – Reid Hayes
            Jul 17 at 18:12




            Thanks for your response! Are there really memory leaks? Doesn't ~TrieNode() recursively delete all of the nodes children before deleting itself?
            – Reid Hayes
            Jul 17 at 18:12




            1




            1




            Operator= doesn't delete the nodes
            – JVApen
            Jul 17 at 18:13




            Operator= doesn't delete the nodes
            – JVApen
            Jul 17 at 18:13












            Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
            – Reid Hayes
            Jul 17 at 18:16




            Ah! Thanks! I was thinking that clear would call the destructors, but obviously it doesn't since its values are pointers.
            – Reid Hayes
            Jul 17 at 18:16












             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199643%2fsimple-trie-class-in-c%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Chat program with C++ and SFML

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            Will my employers contract hold up in court?