Trie Data Structure Implementation in C++11 using Smart Pointers — follow-up

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
11
down vote

favorite
3












Link to my first question.



I followed @JDługosz recommendations. How does it look? Do you have any further recommendations? Is it better (if possible) to replace shared_ptr with unique_ptr? How could someone extend it to use the Unicode character set?



#pragma once

#include <iostream>
#include <memory>
#include <string>

namespace forest
class trie
private:
struct Node
std::shared_ptr<Node> children[26];
bool end = false;
;
std::shared_ptr<Node> root = std::make_shared<Node>();
public:
void insert(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) slot = std::make_shared<Node>();
n = slot;

n->end = true;

bool search(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) return false;
n = slot;

return n && n->end;

;







share|improve this question



















  • Just added an answer with regard to shared_ptr to the previous questions
    – Harald Scheirich
    Apr 6 at 14:30
















up vote
11
down vote

favorite
3












Link to my first question.



I followed @JDługosz recommendations. How does it look? Do you have any further recommendations? Is it better (if possible) to replace shared_ptr with unique_ptr? How could someone extend it to use the Unicode character set?



#pragma once

#include <iostream>
#include <memory>
#include <string>

namespace forest
class trie
private:
struct Node
std::shared_ptr<Node> children[26];
bool end = false;
;
std::shared_ptr<Node> root = std::make_shared<Node>();
public:
void insert(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) slot = std::make_shared<Node>();
n = slot;

n->end = true;

bool search(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) return false;
n = slot;

return n && n->end;

;







share|improve this question



















  • Just added an answer with regard to shared_ptr to the previous questions
    – Harald Scheirich
    Apr 6 at 14:30












up vote
11
down vote

favorite
3









up vote
11
down vote

favorite
3






3





Link to my first question.



I followed @JDługosz recommendations. How does it look? Do you have any further recommendations? Is it better (if possible) to replace shared_ptr with unique_ptr? How could someone extend it to use the Unicode character set?



#pragma once

#include <iostream>
#include <memory>
#include <string>

namespace forest
class trie
private:
struct Node
std::shared_ptr<Node> children[26];
bool end = false;
;
std::shared_ptr<Node> root = std::make_shared<Node>();
public:
void insert(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) slot = std::make_shared<Node>();
n = slot;

n->end = true;

bool search(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) return false;
n = slot;

return n && n->end;

;







share|improve this question











Link to my first question.



I followed @JDługosz recommendations. How does it look? Do you have any further recommendations? Is it better (if possible) to replace shared_ptr with unique_ptr? How could someone extend it to use the Unicode character set?



#pragma once

#include <iostream>
#include <memory>
#include <string>

namespace forest
class trie
private:
struct Node
std::shared_ptr<Node> children[26];
bool end = false;
;
std::shared_ptr<Node> root = std::make_shared<Node>();
public:
void insert(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) slot = std::make_shared<Node>();
n = slot;

n->end = true;

bool search(const std::string & key)
std::shared_ptr<Node> n = root;
for (auto c : key)
int index = c - 'a';
auto& slot = n->children[index];
if (!slot) return false;
n = slot;

return n && n->end;

;









share|improve this question










share|improve this question




share|improve this question









asked Apr 6 at 14:25









xorz57

32912




32912











  • Just added an answer with regard to shared_ptr to the previous questions
    – Harald Scheirich
    Apr 6 at 14:30
















  • Just added an answer with regard to shared_ptr to the previous questions
    – Harald Scheirich
    Apr 6 at 14:30















Just added an answer with regard to shared_ptr to the previous questions
– Harald Scheirich
Apr 6 at 14:30




Just added an answer with regard to shared_ptr to the previous questions
– Harald Scheirich
Apr 6 at 14:30










2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










pointers



I supposed the use of shared pointers was the possibility of sharing tree representation across instances, "persistence", and transactions. I’ve just watched some presentations on persistent data structures (and on Google’s trie, for that matter) so that was in my mind.



I agree with Frank about pointers. When calling code that operates on an object, it doesn’t care how the object is owned, so making it take an argument of type shared_ptr means it can’t take objects owned by unique_ptr, or that are direct members of larger structures, or on the stack, etc. So these arguments are passed as a reference.



In the Standard Guidelines, pointers are always non-owning. You mark them with owner<> to indicate otherwise.



I agree that the root does not need to be dynamically allocated. But you need to avoid having an ugly special case for the first node.



Node* n = &root;
for (auto c : key)
int index = to_index(c);
auto& slot = n->children[index];
if (!slot) slot = make_unique<Node>();
n= slot.get();



duplicated traversal code



I note that both functions have the same logic to traverse the tree, in their core. Normally, like in standard containers, there will be a single function that does this, and it is used by all the other functions.



If these are the only two functions you have, it’s probably not worth the effort. But if you have more (remove, find the closest match, etc.) then you should do that.



26



The first thing I noticed on your update was that you replaced the evil macro with a magic number, rather than a better way of defining a constant.



static constexpr size_t nodesize = 1+'z'-'a';
std::unique_ptr<Node> children[nodesize];


bad key



int index = c - 'a'; // note signed result wanted
if (index<0 || index>=nodesize) throw invalid_argument("oops");


Both functions go over a string in the same manner, so make that a common function.



int index = to_index(c);



Portability of character encoding



It’s been noted that the letters are not necessarily contiguous in the source character set. However, if you are writing in (original) EBCDIC you have worse problems and would not be able to type the characters into the source file. (I’ve discussed C++ on a primitive type of forum software running on an EBCDIC system that was lacking [ ] and a few others, and it is not simple.



The execution character set is distinct from the source character set, and depends on the locale. More generally, you can see that it depends on the source of the strings such as a saved file — if the file uses a character set that doesn’t use the same codes for letters as we expected, then things will go bad.



So, part of the specification is that the input strings will always be in UTF-8, or (sufficient for our purposes) is compatible with ASCII.



What about at compile-time though? The standard says that the value of a character literal 'a' is in the execution character set, not the source character set, which is good. Except that the execution character set is not known until run time, so how can it do that?



However, you can specify that a character is using UTF-8, regardless of any locale or whatnot going on in the compiler or target system.



static constexpr size_t nodesize = 1+u8'z'-u8'a';





share|improve this answer























  • Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
    – xorz57
    Apr 7 at 0:55










  • If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
    – xorz57
    Apr 7 at 0:57










  • make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
    – Snowhawk
    Apr 7 at 1:29











  • @xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
    – JDługosz
    Apr 7 at 3:20










  • Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
    – Millie Smith
    Apr 7 at 3:34


















up vote
11
down vote













Input sanitization



Your functions take a std::string as parameter, as such, they should be "well behaved" for any possible std::string passed to them. Note that well behaved does not mean that it has to "work", just that it should not break anything.



Specifically, what would happen if I passed the string "Hello" to that function? 'H' - 'a' is -25, ruh roh!



There's a few different ways to tackle this.



  • You could convert all upper-case letters to lower case, but that does not fix punctuation marks, spaces and numbers. I also personally think that wether the trie is case-sensitive or not should be the user's problem, not the trie itself.


  • You could bail out of the function if a unhandled character is hit.


  • Just expand children to 256 instead of 26 so that all legal values of char are handled properly. Sure your trie is going to be 5 times as big, but that's a fairly minor detail since it grows logarithmically.


edit: This last approach also makes the trie operate on raw data instead of characters, which makes it encoding-agnostic (which gives you your unicode support)



Avoid using shared_ptr unless absolutely necessary



shared pointers are clunky and heavy affairs. On top of this, using them sends a message to people reading the code: Ownership of the object is potentially shared between multiple owners, which is simply not the case here.



in your case, a std::unique_ptr<> is absolutely fine.



Your root does not need to be dynamically allocated



It's created at construction unconditionally, and destroyed at destruction unconditionally. On top of that, it does not make use of type erasure (polymorphism). As such, there's no reason for it not to be a normal member of trie.



std::shared_ptr<Node> root = std::make_shared<Node>();



becomes:



Node root;



This will require you to change your search and insert functions so that the first line becomes:



const Node* n = &root;



But that's fine because that's preferable once you move to unique_ptr anyways.



Edit: On that note:



Raw pointers are not evil



std::shared_ptr<Node> n = root;



We tend to teach people that "you should never use raw pointers". But I find that extremely naive. The real rule is "You should never have raw pointers with ownership".



There is absolutely nothing wrong with using raw pointers as long as it's understood that:



  • The pointer only refers to the pointed object without owning it.

  • The lifetime of the pointer assignment to the object is fully enclosed by the lifetime of whatever DOES own the object.

In your code using shared_ptr, using the following would have been 100% ok, and much better in my opinion:



const Node* n = root.get();



Mark non-mutating functions as const.



Your search() member function should not alter the trie in any way, so you should mark it as const like this:



bool search(const std::string & key) const {



There's a few subtle advantages for the compiler, but the main one is that if you goof up and accidentally do something that changes the trie, the compiler will tell you.



nitpick: private: is redundant here.



The namespace of a class is private by default. Fun fact: that's the only difference between a class and a struct.



nitpick: redundant null-checking



in your search function, when I read the last line:



return n && n->end;



My impression was "Oh! n can be possibly null in certain cases", which led me to search for the scenario where that might be the case. It's misleading to the reader.



Defensive programming can be useful at times, but this is just excessive.






share|improve this answer























  • Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
    – Konrad Rudolph
    Apr 6 at 16:38











  • @KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
    – Frank
    Apr 6 at 16:43











  • @Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
    – Konrad Rudolph
    Apr 6 at 16:48











  • @KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
    – Frank
    Apr 6 at 16:54










  • @Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
    – Konrad Rudolph
    Apr 6 at 17:21











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191413%2ftrie-data-structure-implementation-in-c11-using-smart-pointers-follow-up%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










pointers



I supposed the use of shared pointers was the possibility of sharing tree representation across instances, "persistence", and transactions. I’ve just watched some presentations on persistent data structures (and on Google’s trie, for that matter) so that was in my mind.



I agree with Frank about pointers. When calling code that operates on an object, it doesn’t care how the object is owned, so making it take an argument of type shared_ptr means it can’t take objects owned by unique_ptr, or that are direct members of larger structures, or on the stack, etc. So these arguments are passed as a reference.



In the Standard Guidelines, pointers are always non-owning. You mark them with owner<> to indicate otherwise.



I agree that the root does not need to be dynamically allocated. But you need to avoid having an ugly special case for the first node.



Node* n = &root;
for (auto c : key)
int index = to_index(c);
auto& slot = n->children[index];
if (!slot) slot = make_unique<Node>();
n= slot.get();



duplicated traversal code



I note that both functions have the same logic to traverse the tree, in their core. Normally, like in standard containers, there will be a single function that does this, and it is used by all the other functions.



If these are the only two functions you have, it’s probably not worth the effort. But if you have more (remove, find the closest match, etc.) then you should do that.



26



The first thing I noticed on your update was that you replaced the evil macro with a magic number, rather than a better way of defining a constant.



static constexpr size_t nodesize = 1+'z'-'a';
std::unique_ptr<Node> children[nodesize];


bad key



int index = c - 'a'; // note signed result wanted
if (index<0 || index>=nodesize) throw invalid_argument("oops");


Both functions go over a string in the same manner, so make that a common function.



int index = to_index(c);



Portability of character encoding



It’s been noted that the letters are not necessarily contiguous in the source character set. However, if you are writing in (original) EBCDIC you have worse problems and would not be able to type the characters into the source file. (I’ve discussed C++ on a primitive type of forum software running on an EBCDIC system that was lacking [ ] and a few others, and it is not simple.



The execution character set is distinct from the source character set, and depends on the locale. More generally, you can see that it depends on the source of the strings such as a saved file — if the file uses a character set that doesn’t use the same codes for letters as we expected, then things will go bad.



So, part of the specification is that the input strings will always be in UTF-8, or (sufficient for our purposes) is compatible with ASCII.



What about at compile-time though? The standard says that the value of a character literal 'a' is in the execution character set, not the source character set, which is good. Except that the execution character set is not known until run time, so how can it do that?



However, you can specify that a character is using UTF-8, regardless of any locale or whatnot going on in the compiler or target system.



static constexpr size_t nodesize = 1+u8'z'-u8'a';





share|improve this answer























  • Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
    – xorz57
    Apr 7 at 0:55










  • If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
    – xorz57
    Apr 7 at 0:57










  • make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
    – Snowhawk
    Apr 7 at 1:29











  • @xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
    – JDługosz
    Apr 7 at 3:20










  • Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
    – Millie Smith
    Apr 7 at 3:34















up vote
3
down vote



accepted










pointers



I supposed the use of shared pointers was the possibility of sharing tree representation across instances, "persistence", and transactions. I’ve just watched some presentations on persistent data structures (and on Google’s trie, for that matter) so that was in my mind.



I agree with Frank about pointers. When calling code that operates on an object, it doesn’t care how the object is owned, so making it take an argument of type shared_ptr means it can’t take objects owned by unique_ptr, or that are direct members of larger structures, or on the stack, etc. So these arguments are passed as a reference.



In the Standard Guidelines, pointers are always non-owning. You mark them with owner<> to indicate otherwise.



I agree that the root does not need to be dynamically allocated. But you need to avoid having an ugly special case for the first node.



Node* n = &root;
for (auto c : key)
int index = to_index(c);
auto& slot = n->children[index];
if (!slot) slot = make_unique<Node>();
n= slot.get();



duplicated traversal code



I note that both functions have the same logic to traverse the tree, in their core. Normally, like in standard containers, there will be a single function that does this, and it is used by all the other functions.



If these are the only two functions you have, it’s probably not worth the effort. But if you have more (remove, find the closest match, etc.) then you should do that.



26



The first thing I noticed on your update was that you replaced the evil macro with a magic number, rather than a better way of defining a constant.



static constexpr size_t nodesize = 1+'z'-'a';
std::unique_ptr<Node> children[nodesize];


bad key



int index = c - 'a'; // note signed result wanted
if (index<0 || index>=nodesize) throw invalid_argument("oops");


Both functions go over a string in the same manner, so make that a common function.



int index = to_index(c);



Portability of character encoding



It’s been noted that the letters are not necessarily contiguous in the source character set. However, if you are writing in (original) EBCDIC you have worse problems and would not be able to type the characters into the source file. (I’ve discussed C++ on a primitive type of forum software running on an EBCDIC system that was lacking [ ] and a few others, and it is not simple.



The execution character set is distinct from the source character set, and depends on the locale. More generally, you can see that it depends on the source of the strings such as a saved file — if the file uses a character set that doesn’t use the same codes for letters as we expected, then things will go bad.



So, part of the specification is that the input strings will always be in UTF-8, or (sufficient for our purposes) is compatible with ASCII.



What about at compile-time though? The standard says that the value of a character literal 'a' is in the execution character set, not the source character set, which is good. Except that the execution character set is not known until run time, so how can it do that?



However, you can specify that a character is using UTF-8, regardless of any locale or whatnot going on in the compiler or target system.



static constexpr size_t nodesize = 1+u8'z'-u8'a';





share|improve this answer























  • Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
    – xorz57
    Apr 7 at 0:55










  • If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
    – xorz57
    Apr 7 at 0:57










  • make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
    – Snowhawk
    Apr 7 at 1:29











  • @xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
    – JDługosz
    Apr 7 at 3:20










  • Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
    – Millie Smith
    Apr 7 at 3:34













up vote
3
down vote



accepted







up vote
3
down vote



accepted






pointers



I supposed the use of shared pointers was the possibility of sharing tree representation across instances, "persistence", and transactions. I’ve just watched some presentations on persistent data structures (and on Google’s trie, for that matter) so that was in my mind.



I agree with Frank about pointers. When calling code that operates on an object, it doesn’t care how the object is owned, so making it take an argument of type shared_ptr means it can’t take objects owned by unique_ptr, or that are direct members of larger structures, or on the stack, etc. So these arguments are passed as a reference.



In the Standard Guidelines, pointers are always non-owning. You mark them with owner<> to indicate otherwise.



I agree that the root does not need to be dynamically allocated. But you need to avoid having an ugly special case for the first node.



Node* n = &root;
for (auto c : key)
int index = to_index(c);
auto& slot = n->children[index];
if (!slot) slot = make_unique<Node>();
n= slot.get();



duplicated traversal code



I note that both functions have the same logic to traverse the tree, in their core. Normally, like in standard containers, there will be a single function that does this, and it is used by all the other functions.



If these are the only two functions you have, it’s probably not worth the effort. But if you have more (remove, find the closest match, etc.) then you should do that.



26



The first thing I noticed on your update was that you replaced the evil macro with a magic number, rather than a better way of defining a constant.



static constexpr size_t nodesize = 1+'z'-'a';
std::unique_ptr<Node> children[nodesize];


bad key



int index = c - 'a'; // note signed result wanted
if (index<0 || index>=nodesize) throw invalid_argument("oops");


Both functions go over a string in the same manner, so make that a common function.



int index = to_index(c);



Portability of character encoding



It’s been noted that the letters are not necessarily contiguous in the source character set. However, if you are writing in (original) EBCDIC you have worse problems and would not be able to type the characters into the source file. (I’ve discussed C++ on a primitive type of forum software running on an EBCDIC system that was lacking [ ] and a few others, and it is not simple.



The execution character set is distinct from the source character set, and depends on the locale. More generally, you can see that it depends on the source of the strings such as a saved file — if the file uses a character set that doesn’t use the same codes for letters as we expected, then things will go bad.



So, part of the specification is that the input strings will always be in UTF-8, or (sufficient for our purposes) is compatible with ASCII.



What about at compile-time though? The standard says that the value of a character literal 'a' is in the execution character set, not the source character set, which is good. Except that the execution character set is not known until run time, so how can it do that?



However, you can specify that a character is using UTF-8, regardless of any locale or whatnot going on in the compiler or target system.



static constexpr size_t nodesize = 1+u8'z'-u8'a';





share|improve this answer















pointers



I supposed the use of shared pointers was the possibility of sharing tree representation across instances, "persistence", and transactions. I’ve just watched some presentations on persistent data structures (and on Google’s trie, for that matter) so that was in my mind.



I agree with Frank about pointers. When calling code that operates on an object, it doesn’t care how the object is owned, so making it take an argument of type shared_ptr means it can’t take objects owned by unique_ptr, or that are direct members of larger structures, or on the stack, etc. So these arguments are passed as a reference.



In the Standard Guidelines, pointers are always non-owning. You mark them with owner<> to indicate otherwise.



I agree that the root does not need to be dynamically allocated. But you need to avoid having an ugly special case for the first node.



Node* n = &root;
for (auto c : key)
int index = to_index(c);
auto& slot = n->children[index];
if (!slot) slot = make_unique<Node>();
n= slot.get();



duplicated traversal code



I note that both functions have the same logic to traverse the tree, in their core. Normally, like in standard containers, there will be a single function that does this, and it is used by all the other functions.



If these are the only two functions you have, it’s probably not worth the effort. But if you have more (remove, find the closest match, etc.) then you should do that.



26



The first thing I noticed on your update was that you replaced the evil macro with a magic number, rather than a better way of defining a constant.



static constexpr size_t nodesize = 1+'z'-'a';
std::unique_ptr<Node> children[nodesize];


bad key



int index = c - 'a'; // note signed result wanted
if (index<0 || index>=nodesize) throw invalid_argument("oops");


Both functions go over a string in the same manner, so make that a common function.



int index = to_index(c);



Portability of character encoding



It’s been noted that the letters are not necessarily contiguous in the source character set. However, if you are writing in (original) EBCDIC you have worse problems and would not be able to type the characters into the source file. (I’ve discussed C++ on a primitive type of forum software running on an EBCDIC system that was lacking [ ] and a few others, and it is not simple.



The execution character set is distinct from the source character set, and depends on the locale. More generally, you can see that it depends on the source of the strings such as a saved file — if the file uses a character set that doesn’t use the same codes for letters as we expected, then things will go bad.



So, part of the specification is that the input strings will always be in UTF-8, or (sufficient for our purposes) is compatible with ASCII.



What about at compile-time though? The standard says that the value of a character literal 'a' is in the execution character set, not the source character set, which is good. Except that the execution character set is not known until run time, so how can it do that?



However, you can specify that a character is using UTF-8, regardless of any locale or whatnot going on in the compiler or target system.



static constexpr size_t nodesize = 1+u8'z'-u8'a';






share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 7 at 23:04


























answered Apr 6 at 23:59









JDługosz

5,047731




5,047731











  • Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
    – xorz57
    Apr 7 at 0:55










  • If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
    – xorz57
    Apr 7 at 0:57










  • make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
    – Snowhawk
    Apr 7 at 1:29











  • @xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
    – JDługosz
    Apr 7 at 3:20










  • Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
    – Millie Smith
    Apr 7 at 3:34

















  • Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
    – xorz57
    Apr 7 at 0:55










  • If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
    – xorz57
    Apr 7 at 0:57










  • make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
    – Snowhawk
    Apr 7 at 1:29











  • @xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
    – JDługosz
    Apr 7 at 3:20










  • Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
    – Millie Smith
    Apr 7 at 3:34
















Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
– xorz57
Apr 7 at 0:55




Is it possible to avoid that std::make_unique because it uses c++14? I am asking this because initially, i wanted my library to use c++11 only. Does std::shared_ptr really hurt in this implementation?
– xorz57
Apr 7 at 0:55












If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
– xorz57
Apr 7 at 0:57




If at some point an exception is thrown the string would be half written I fear... that's something else that bothers me.
– xorz57
Apr 7 at 0:57












make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
– Snowhawk
Apr 7 at 1:29





make_unique doesn't require C++14 to implement. std::make_unique not being in C++11 was an oversight. std::shared_ptr will work, but it's overkill. Each subtrie in the node array is owned by one parent trie. You don't need the std::shared_ptr control block/ref-counting. As for providing the strong exception guarantee on insertion, Create a new tree for the insertion term and merge. You could even optimize this by prefix searching and only creating/transferring the new suffix.
– Snowhawk
Apr 7 at 1:29













@xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
– JDługosz
Apr 7 at 3:20




@xorz57 just add the missing def yourself, and use that. Performance: benchmark it both ways and see!
– JDługosz
Apr 7 at 3:20












Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
– Millie Smith
Apr 7 at 3:34





Btw, 'z'-'a' is not guaranteed to be 25 in C / C++. Only the order of 0 - 9 is guaranteed. It may not make a huge difference in practice, but I generally try to use "abcdefghijklmnopqrstuvwxyz" in some way (defining it somewhere and doing countof or strlen in this case).
– Millie Smith
Apr 7 at 3:34













up vote
11
down vote













Input sanitization



Your functions take a std::string as parameter, as such, they should be "well behaved" for any possible std::string passed to them. Note that well behaved does not mean that it has to "work", just that it should not break anything.



Specifically, what would happen if I passed the string "Hello" to that function? 'H' - 'a' is -25, ruh roh!



There's a few different ways to tackle this.



  • You could convert all upper-case letters to lower case, but that does not fix punctuation marks, spaces and numbers. I also personally think that wether the trie is case-sensitive or not should be the user's problem, not the trie itself.


  • You could bail out of the function if a unhandled character is hit.


  • Just expand children to 256 instead of 26 so that all legal values of char are handled properly. Sure your trie is going to be 5 times as big, but that's a fairly minor detail since it grows logarithmically.


edit: This last approach also makes the trie operate on raw data instead of characters, which makes it encoding-agnostic (which gives you your unicode support)



Avoid using shared_ptr unless absolutely necessary



shared pointers are clunky and heavy affairs. On top of this, using them sends a message to people reading the code: Ownership of the object is potentially shared between multiple owners, which is simply not the case here.



in your case, a std::unique_ptr<> is absolutely fine.



Your root does not need to be dynamically allocated



It's created at construction unconditionally, and destroyed at destruction unconditionally. On top of that, it does not make use of type erasure (polymorphism). As such, there's no reason for it not to be a normal member of trie.



std::shared_ptr<Node> root = std::make_shared<Node>();



becomes:



Node root;



This will require you to change your search and insert functions so that the first line becomes:



const Node* n = &root;



But that's fine because that's preferable once you move to unique_ptr anyways.



Edit: On that note:



Raw pointers are not evil



std::shared_ptr<Node> n = root;



We tend to teach people that "you should never use raw pointers". But I find that extremely naive. The real rule is "You should never have raw pointers with ownership".



There is absolutely nothing wrong with using raw pointers as long as it's understood that:



  • The pointer only refers to the pointed object without owning it.

  • The lifetime of the pointer assignment to the object is fully enclosed by the lifetime of whatever DOES own the object.

In your code using shared_ptr, using the following would have been 100% ok, and much better in my opinion:



const Node* n = root.get();



Mark non-mutating functions as const.



Your search() member function should not alter the trie in any way, so you should mark it as const like this:



bool search(const std::string & key) const {



There's a few subtle advantages for the compiler, but the main one is that if you goof up and accidentally do something that changes the trie, the compiler will tell you.



nitpick: private: is redundant here.



The namespace of a class is private by default. Fun fact: that's the only difference between a class and a struct.



nitpick: redundant null-checking



in your search function, when I read the last line:



return n && n->end;



My impression was "Oh! n can be possibly null in certain cases", which led me to search for the scenario where that might be the case. It's misleading to the reader.



Defensive programming can be useful at times, but this is just excessive.






share|improve this answer























  • Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
    – Konrad Rudolph
    Apr 6 at 16:38











  • @KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
    – Frank
    Apr 6 at 16:43











  • @Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
    – Konrad Rudolph
    Apr 6 at 16:48











  • @KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
    – Frank
    Apr 6 at 16:54










  • @Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
    – Konrad Rudolph
    Apr 6 at 17:21















up vote
11
down vote













Input sanitization



Your functions take a std::string as parameter, as such, they should be "well behaved" for any possible std::string passed to them. Note that well behaved does not mean that it has to "work", just that it should not break anything.



Specifically, what would happen if I passed the string "Hello" to that function? 'H' - 'a' is -25, ruh roh!



There's a few different ways to tackle this.



  • You could convert all upper-case letters to lower case, but that does not fix punctuation marks, spaces and numbers. I also personally think that wether the trie is case-sensitive or not should be the user's problem, not the trie itself.


  • You could bail out of the function if a unhandled character is hit.


  • Just expand children to 256 instead of 26 so that all legal values of char are handled properly. Sure your trie is going to be 5 times as big, but that's a fairly minor detail since it grows logarithmically.


edit: This last approach also makes the trie operate on raw data instead of characters, which makes it encoding-agnostic (which gives you your unicode support)



Avoid using shared_ptr unless absolutely necessary



shared pointers are clunky and heavy affairs. On top of this, using them sends a message to people reading the code: Ownership of the object is potentially shared between multiple owners, which is simply not the case here.



in your case, a std::unique_ptr<> is absolutely fine.



Your root does not need to be dynamically allocated



It's created at construction unconditionally, and destroyed at destruction unconditionally. On top of that, it does not make use of type erasure (polymorphism). As such, there's no reason for it not to be a normal member of trie.



std::shared_ptr<Node> root = std::make_shared<Node>();



becomes:



Node root;



This will require you to change your search and insert functions so that the first line becomes:



const Node* n = &root;



But that's fine because that's preferable once you move to unique_ptr anyways.



Edit: On that note:



Raw pointers are not evil



std::shared_ptr<Node> n = root;



We tend to teach people that "you should never use raw pointers". But I find that extremely naive. The real rule is "You should never have raw pointers with ownership".



There is absolutely nothing wrong with using raw pointers as long as it's understood that:



  • The pointer only refers to the pointed object without owning it.

  • The lifetime of the pointer assignment to the object is fully enclosed by the lifetime of whatever DOES own the object.

In your code using shared_ptr, using the following would have been 100% ok, and much better in my opinion:



const Node* n = root.get();



Mark non-mutating functions as const.



Your search() member function should not alter the trie in any way, so you should mark it as const like this:



bool search(const std::string & key) const {



There's a few subtle advantages for the compiler, but the main one is that if you goof up and accidentally do something that changes the trie, the compiler will tell you.



nitpick: private: is redundant here.



The namespace of a class is private by default. Fun fact: that's the only difference between a class and a struct.



nitpick: redundant null-checking



in your search function, when I read the last line:



return n && n->end;



My impression was "Oh! n can be possibly null in certain cases", which led me to search for the scenario where that might be the case. It's misleading to the reader.



Defensive programming can be useful at times, but this is just excessive.






share|improve this answer























  • Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
    – Konrad Rudolph
    Apr 6 at 16:38











  • @KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
    – Frank
    Apr 6 at 16:43











  • @Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
    – Konrad Rudolph
    Apr 6 at 16:48











  • @KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
    – Frank
    Apr 6 at 16:54










  • @Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
    – Konrad Rudolph
    Apr 6 at 17:21













up vote
11
down vote










up vote
11
down vote









Input sanitization



Your functions take a std::string as parameter, as such, they should be "well behaved" for any possible std::string passed to them. Note that well behaved does not mean that it has to "work", just that it should not break anything.



Specifically, what would happen if I passed the string "Hello" to that function? 'H' - 'a' is -25, ruh roh!



There's a few different ways to tackle this.



  • You could convert all upper-case letters to lower case, but that does not fix punctuation marks, spaces and numbers. I also personally think that wether the trie is case-sensitive or not should be the user's problem, not the trie itself.


  • You could bail out of the function if a unhandled character is hit.


  • Just expand children to 256 instead of 26 so that all legal values of char are handled properly. Sure your trie is going to be 5 times as big, but that's a fairly minor detail since it grows logarithmically.


edit: This last approach also makes the trie operate on raw data instead of characters, which makes it encoding-agnostic (which gives you your unicode support)



Avoid using shared_ptr unless absolutely necessary



shared pointers are clunky and heavy affairs. On top of this, using them sends a message to people reading the code: Ownership of the object is potentially shared between multiple owners, which is simply not the case here.



in your case, a std::unique_ptr<> is absolutely fine.



Your root does not need to be dynamically allocated



It's created at construction unconditionally, and destroyed at destruction unconditionally. On top of that, it does not make use of type erasure (polymorphism). As such, there's no reason for it not to be a normal member of trie.



std::shared_ptr<Node> root = std::make_shared<Node>();



becomes:



Node root;



This will require you to change your search and insert functions so that the first line becomes:



const Node* n = &root;



But that's fine because that's preferable once you move to unique_ptr anyways.



Edit: On that note:



Raw pointers are not evil



std::shared_ptr<Node> n = root;



We tend to teach people that "you should never use raw pointers". But I find that extremely naive. The real rule is "You should never have raw pointers with ownership".



There is absolutely nothing wrong with using raw pointers as long as it's understood that:



  • The pointer only refers to the pointed object without owning it.

  • The lifetime of the pointer assignment to the object is fully enclosed by the lifetime of whatever DOES own the object.

In your code using shared_ptr, using the following would have been 100% ok, and much better in my opinion:



const Node* n = root.get();



Mark non-mutating functions as const.



Your search() member function should not alter the trie in any way, so you should mark it as const like this:



bool search(const std::string & key) const {



There's a few subtle advantages for the compiler, but the main one is that if you goof up and accidentally do something that changes the trie, the compiler will tell you.



nitpick: private: is redundant here.



The namespace of a class is private by default. Fun fact: that's the only difference between a class and a struct.



nitpick: redundant null-checking



in your search function, when I read the last line:



return n && n->end;



My impression was "Oh! n can be possibly null in certain cases", which led me to search for the scenario where that might be the case. It's misleading to the reader.



Defensive programming can be useful at times, but this is just excessive.






share|improve this answer















Input sanitization



Your functions take a std::string as parameter, as such, they should be "well behaved" for any possible std::string passed to them. Note that well behaved does not mean that it has to "work", just that it should not break anything.



Specifically, what would happen if I passed the string "Hello" to that function? 'H' - 'a' is -25, ruh roh!



There's a few different ways to tackle this.



  • You could convert all upper-case letters to lower case, but that does not fix punctuation marks, spaces and numbers. I also personally think that wether the trie is case-sensitive or not should be the user's problem, not the trie itself.


  • You could bail out of the function if a unhandled character is hit.


  • Just expand children to 256 instead of 26 so that all legal values of char are handled properly. Sure your trie is going to be 5 times as big, but that's a fairly minor detail since it grows logarithmically.


edit: This last approach also makes the trie operate on raw data instead of characters, which makes it encoding-agnostic (which gives you your unicode support)



Avoid using shared_ptr unless absolutely necessary



shared pointers are clunky and heavy affairs. On top of this, using them sends a message to people reading the code: Ownership of the object is potentially shared between multiple owners, which is simply not the case here.



in your case, a std::unique_ptr<> is absolutely fine.



Your root does not need to be dynamically allocated



It's created at construction unconditionally, and destroyed at destruction unconditionally. On top of that, it does not make use of type erasure (polymorphism). As such, there's no reason for it not to be a normal member of trie.



std::shared_ptr<Node> root = std::make_shared<Node>();



becomes:



Node root;



This will require you to change your search and insert functions so that the first line becomes:



const Node* n = &root;



But that's fine because that's preferable once you move to unique_ptr anyways.



Edit: On that note:



Raw pointers are not evil



std::shared_ptr<Node> n = root;



We tend to teach people that "you should never use raw pointers". But I find that extremely naive. The real rule is "You should never have raw pointers with ownership".



There is absolutely nothing wrong with using raw pointers as long as it's understood that:



  • The pointer only refers to the pointed object without owning it.

  • The lifetime of the pointer assignment to the object is fully enclosed by the lifetime of whatever DOES own the object.

In your code using shared_ptr, using the following would have been 100% ok, and much better in my opinion:



const Node* n = root.get();



Mark non-mutating functions as const.



Your search() member function should not alter the trie in any way, so you should mark it as const like this:



bool search(const std::string & key) const {



There's a few subtle advantages for the compiler, but the main one is that if you goof up and accidentally do something that changes the trie, the compiler will tell you.



nitpick: private: is redundant here.



The namespace of a class is private by default. Fun fact: that's the only difference between a class and a struct.



nitpick: redundant null-checking



in your search function, when I read the last line:



return n && n->end;



My impression was "Oh! n can be possibly null in certain cases", which led me to search for the scenario where that might be the case. It's misleading to the reader.



Defensive programming can be useful at times, but this is just excessive.







share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 6 at 16:36


























answered Apr 6 at 14:54









Frank

2,947319




2,947319











  • Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
    – Konrad Rudolph
    Apr 6 at 16:38











  • @KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
    – Frank
    Apr 6 at 16:43











  • @Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
    – Konrad Rudolph
    Apr 6 at 16:48











  • @KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
    – Frank
    Apr 6 at 16:54










  • @Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
    – Konrad Rudolph
    Apr 6 at 17:21

















  • Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
    – Konrad Rudolph
    Apr 6 at 16:38











  • @KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
    – Frank
    Apr 6 at 16:43











  • @Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
    – Konrad Rudolph
    Apr 6 at 16:48











  • @KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
    – Frank
    Apr 6 at 16:54










  • @Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
    – Konrad Rudolph
    Apr 6 at 17:21
















Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
– Konrad Rudolph
Apr 6 at 16:38





Great answer. My only complaint: you are too timid. You’re saying that “in [your] opinion” it would be better to use a raw pointer (rather than a smart pointer) in situation you point out. But this isn’t a matter of opinion. Using a std::shared_ptr here is wrong (because it implies incorrect ownership semantics). At a stretch you could use a std::weak_ptr here but I’d argue that even that is wrong since that, too, implies a semantic that doesn’t hold: namely that the owner might relinquish the resource while we’re accessing it. No. This is a raw pointer.
– Konrad Rudolph
Apr 6 at 16:38













@KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
– Frank
Apr 6 at 16:43





@KonradRudolph A strong wrong would imply that the code is not well formed and fundamentally incorrect (like the out of bounds error I pointed out). This is not the case here. It's bad code, and should be changed, but it's not wrong code, it works. But I tend to agree, it's pretty black-and-white here.
– Frank
Apr 6 at 16:43













@Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
– Konrad Rudolph
Apr 6 at 16:48





@Frank Ill-formedness and runtime errors are merely specific subsets of “wrong”. For instance, a program can be well-formed, have no runtime errors, but give the wrong result. Or it can give the right result purely by accident. This is an instance of the latter: using a shared pointer happens to work here but it violates the use-case that it was designed for (due to the implicit specification of the algorithm, which has no ownership here). This use-case can’t be enforced programmatically but that doesn’t make the usage less wrong, merely unenforceable.
– Konrad Rudolph
Apr 6 at 16:48













@KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
– Frank
Apr 6 at 16:54




@KonradRudolph But temporarily extending the lifetime of a shared object during transient processing is a legit (and common if you ever used ASIO) use-case. Using that technique here falls under the overkill, excessively defensive category of errors, similar to that null check.
– Frank
Apr 6 at 16:54












@Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
– Konrad Rudolph
Apr 6 at 17:21





@Frank There’s no lifetime extension happening here. This is a pleonasm. Pleonasms, unless they serve a purpose in documenting expected behaviour (= defensive coding), are logic errors that simply happen to have no effect on the correctness of the result.
– Konrad Rudolph
Apr 6 at 17:21













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191413%2ftrie-data-structure-implementation-in-c11-using-smart-pointers-follow-up%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?