Determine if one string is a permutation of the another

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
12
down vote

favorite
3












Can anyone give me some feedback if I'm using std::unordered_multimap optimally? I don't know the ins-and-outs of the std::unordered_multimap implementation.



I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?



bool is_permutation(const std::string& s1, const std::string& s2)

assert(s1.size() && s2.size());

if(s1.length() != s2.length())
return false;

std::unordered_multimap<int, char> m1, m2;

for(int i = 0; i < s1.length(); ++i)

m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);


if(m1 == m2)
return true;

return false;







share|improve this question





















  • Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
    – Pedro A
    Apr 18 at 0:05










  • You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
    – JDługosz
    Apr 18 at 6:14






  • 4




    Sort the characters in both strings and compare them?
    – Thorbjørn Ravn Andersen
    Apr 18 at 8:56










  • Is there a good reason not to simply return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());? Or should this be tagged reinventing-the-wheel?
    – Toby Speight
    Apr 23 at 19:14
















up vote
12
down vote

favorite
3












Can anyone give me some feedback if I'm using std::unordered_multimap optimally? I don't know the ins-and-outs of the std::unordered_multimap implementation.



I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?



bool is_permutation(const std::string& s1, const std::string& s2)

assert(s1.size() && s2.size());

if(s1.length() != s2.length())
return false;

std::unordered_multimap<int, char> m1, m2;

for(int i = 0; i < s1.length(); ++i)

m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);


if(m1 == m2)
return true;

return false;







share|improve this question





















  • Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
    – Pedro A
    Apr 18 at 0:05










  • You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
    – JDługosz
    Apr 18 at 6:14






  • 4




    Sort the characters in both strings and compare them?
    – Thorbjørn Ravn Andersen
    Apr 18 at 8:56










  • Is there a good reason not to simply return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());? Or should this be tagged reinventing-the-wheel?
    – Toby Speight
    Apr 23 at 19:14












up vote
12
down vote

favorite
3









up vote
12
down vote

favorite
3






3





Can anyone give me some feedback if I'm using std::unordered_multimap optimally? I don't know the ins-and-outs of the std::unordered_multimap implementation.



I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?



bool is_permutation(const std::string& s1, const std::string& s2)

assert(s1.size() && s2.size());

if(s1.length() != s2.length())
return false;

std::unordered_multimap<int, char> m1, m2;

for(int i = 0; i < s1.length(); ++i)

m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);


if(m1 == m2)
return true;

return false;







share|improve this question













Can anyone give me some feedback if I'm using std::unordered_multimap optimally? I don't know the ins-and-outs of the std::unordered_multimap implementation.



I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?



bool is_permutation(const std::string& s1, const std::string& s2)

assert(s1.size() && s2.size());

if(s1.length() != s2.length())
return false;

std::unordered_multimap<int, char> m1, m2;

for(int i = 0; i < s1.length(); ++i)

m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);


if(m1 == m2)
return true;

return false;









share|improve this question












share|improve this question




share|improve this question








edited Apr 18 at 23:38









Jamal♦

30.1k11114225




30.1k11114225









asked Apr 17 at 20:20









John DeBord

634




634











  • Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
    – Pedro A
    Apr 18 at 0:05










  • You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
    – JDługosz
    Apr 18 at 6:14






  • 4




    Sort the characters in both strings and compare them?
    – Thorbjørn Ravn Andersen
    Apr 18 at 8:56










  • Is there a good reason not to simply return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());? Or should this be tagged reinventing-the-wheel?
    – Toby Speight
    Apr 23 at 19:14
















  • Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
    – Pedro A
    Apr 18 at 0:05










  • You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
    – JDługosz
    Apr 18 at 6:14






  • 4




    Sort the characters in both strings and compare them?
    – Thorbjørn Ravn Andersen
    Apr 18 at 8:56










  • Is there a good reason not to simply return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());? Or should this be tagged reinventing-the-wheel?
    – Toby Speight
    Apr 23 at 19:14















Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
– Pedro A
Apr 18 at 0:05




Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
– Pedro A
Apr 18 at 0:05












You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
– JDługosz
Apr 18 at 6:14




You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
– JDługosz
Apr 18 at 6:14




4




4




Sort the characters in both strings and compare them?
– Thorbjørn Ravn Andersen
Apr 18 at 8:56




Sort the characters in both strings and compare them?
– Thorbjørn Ravn Andersen
Apr 18 at 8:56












Is there a good reason not to simply return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());? Or should this be tagged reinventing-the-wheel?
– Toby Speight
Apr 23 at 19:14




Is there a good reason not to simply return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());? Or should this be tagged reinventing-the-wheel?
– Toby Speight
Apr 23 at 19:14










3 Answers
3






active

oldest

votes

















up vote
14
down vote



accepted










Comparing two maps is linear, so that's not a big problem.



I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset, or you could use a unordered_map<char, size_t>. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.



Personally, I think it's generally simpler to just sort the two strings, then compare them:



bool is_permutation(std::string s1, std::string s2) 
if (s1.length() != s2.length())
return false;

std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;



A version making good use of an unordered_map:



bool is_perm2(std::string const &a, std::string const &b) 
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.



I have not, however, gotten nearly as good of results from using a multiset or unordered_multiset. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap ends up similarly (or probably even worse).



For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:



enter image description here



[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]



So this shows the unordered_map having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.



Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map for larger strings.



Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.



There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.



Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).



Edit 2:



After more testing, I have to agree with the consensus that counting characters in an array is a better choice:



bool is_perm3(std::string const &a, std::string const &b) 
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;



I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector vs. std::array in his test. We'd expect array to have less initial overhead, but the vector was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).






share|improve this answer























  • Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
    – John DeBord
    Apr 17 at 20:30











  • @JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
    – Jerry Coffin
    Apr 17 at 20:31







  • 5




    Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
    – vnp
    Apr 17 at 21:00






  • 3




    @vnp: I am betting that sorting is the quickest for short strings.
    – Martin York
    Apr 17 at 23:26










  • @JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
    – Martin York
    Apr 17 at 23:30

















up vote
15
down vote













I would say yes, use an array.



Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.



You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.



As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.



bool is_permutation(const std::string& s1, const std::string& s2)

if(s1.length() != s2.length())
return false;

// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places

for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;

for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;


for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;


return true;



Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.



One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.






share|improve this answer



















  • 3




    Note that 1 << CHAR_BIT is not guaranteed to be 256.
    – wchargin
    Apr 18 at 7:48










  • I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
    – Matthieu M.
    Apr 18 at 13:12






  • 1




    For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
    – Matthieu M.
    Apr 18 at 13:18

















up vote
6
down vote













You have got the basics.

Though @Jerry Coffin points out some good points.



I would take this one step further.

A std::unordered_multimap is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char. Very simple and a very small number of them (even if you use all 256).



So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.



// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)

if (a.length() != b.length())
return false;

std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



Notes:



  • Complexity O(n).

  • Overhead per iteration very small.

  • Overhead high for initialization.





share|improve this answer



















  • 2




    What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
    – Toby Speight
    Apr 18 at 9:27










  • Shouldn't that be char or preferable auto for c?
    – Deduplicator
    Apr 18 at 16:21










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192327%2fdetermine-if-one-string-is-a-permutation-of-the-another%23new-answer', 'question_page');

);

Post as a guest






























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
14
down vote



accepted










Comparing two maps is linear, so that's not a big problem.



I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset, or you could use a unordered_map<char, size_t>. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.



Personally, I think it's generally simpler to just sort the two strings, then compare them:



bool is_permutation(std::string s1, std::string s2) 
if (s1.length() != s2.length())
return false;

std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;



A version making good use of an unordered_map:



bool is_perm2(std::string const &a, std::string const &b) 
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.



I have not, however, gotten nearly as good of results from using a multiset or unordered_multiset. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap ends up similarly (or probably even worse).



For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:



enter image description here



[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]



So this shows the unordered_map having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.



Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map for larger strings.



Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.



There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.



Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).



Edit 2:



After more testing, I have to agree with the consensus that counting characters in an array is a better choice:



bool is_perm3(std::string const &a, std::string const &b) 
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;



I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector vs. std::array in his test. We'd expect array to have less initial overhead, but the vector was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).






share|improve this answer























  • Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
    – John DeBord
    Apr 17 at 20:30











  • @JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
    – Jerry Coffin
    Apr 17 at 20:31







  • 5




    Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
    – vnp
    Apr 17 at 21:00






  • 3




    @vnp: I am betting that sorting is the quickest for short strings.
    – Martin York
    Apr 17 at 23:26










  • @JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
    – Martin York
    Apr 17 at 23:30














up vote
14
down vote



accepted










Comparing two maps is linear, so that's not a big problem.



I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset, or you could use a unordered_map<char, size_t>. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.



Personally, I think it's generally simpler to just sort the two strings, then compare them:



bool is_permutation(std::string s1, std::string s2) 
if (s1.length() != s2.length())
return false;

std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;



A version making good use of an unordered_map:



bool is_perm2(std::string const &a, std::string const &b) 
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.



I have not, however, gotten nearly as good of results from using a multiset or unordered_multiset. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap ends up similarly (or probably even worse).



For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:



enter image description here



[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]



So this shows the unordered_map having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.



Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map for larger strings.



Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.



There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.



Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).



Edit 2:



After more testing, I have to agree with the consensus that counting characters in an array is a better choice:



bool is_perm3(std::string const &a, std::string const &b) 
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;



I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector vs. std::array in his test. We'd expect array to have less initial overhead, but the vector was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).






share|improve this answer























  • Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
    – John DeBord
    Apr 17 at 20:30











  • @JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
    – Jerry Coffin
    Apr 17 at 20:31







  • 5




    Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
    – vnp
    Apr 17 at 21:00






  • 3




    @vnp: I am betting that sorting is the quickest for short strings.
    – Martin York
    Apr 17 at 23:26










  • @JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
    – Martin York
    Apr 17 at 23:30












up vote
14
down vote



accepted







up vote
14
down vote



accepted






Comparing two maps is linear, so that's not a big problem.



I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset, or you could use a unordered_map<char, size_t>. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.



Personally, I think it's generally simpler to just sort the two strings, then compare them:



bool is_permutation(std::string s1, std::string s2) 
if (s1.length() != s2.length())
return false;

std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;



A version making good use of an unordered_map:



bool is_perm2(std::string const &a, std::string const &b) 
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.



I have not, however, gotten nearly as good of results from using a multiset or unordered_multiset. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap ends up similarly (or probably even worse).



For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:



enter image description here



[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]



So this shows the unordered_map having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.



Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map for larger strings.



Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.



There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.



Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).



Edit 2:



After more testing, I have to agree with the consensus that counting characters in an array is a better choice:



bool is_perm3(std::string const &a, std::string const &b) 
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;



I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector vs. std::array in his test. We'd expect array to have less initial overhead, but the vector was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).






share|improve this answer















Comparing two maps is linear, so that's not a big problem.



I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset, or you could use a unordered_map<char, size_t>. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.



Personally, I think it's generally simpler to just sort the two strings, then compare them:



bool is_permutation(std::string s1, std::string s2) 
if (s1.length() != s2.length())
return false;

std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;



A version making good use of an unordered_map:



bool is_perm2(std::string const &a, std::string const &b) 
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.



I have not, however, gotten nearly as good of results from using a multiset or unordered_multiset. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap ends up similarly (or probably even worse).



For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:



enter image description here



[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]



So this shows the unordered_map having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.



Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map for larger strings.



Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.



There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.



Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).



Edit 2:



After more testing, I have to agree with the consensus that counting characters in an array is a better choice:



bool is_perm3(std::string const &a, std::string const &b) 
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;

if (a.length() != b.length())
return false;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;



I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector vs. std::array in his test. We'd expect array to have less initial overhead, but the vector was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).







share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 18 at 16:40


























answered Apr 17 at 20:26









Jerry Coffin

27.4k360123




27.4k360123











  • Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
    – John DeBord
    Apr 17 at 20:30











  • @JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
    – Jerry Coffin
    Apr 17 at 20:31







  • 5




    Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
    – vnp
    Apr 17 at 21:00






  • 3




    @vnp: I am betting that sorting is the quickest for short strings.
    – Martin York
    Apr 17 at 23:26










  • @JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
    – Martin York
    Apr 17 at 23:30
















  • Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
    – John DeBord
    Apr 17 at 20:30











  • @JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
    – Jerry Coffin
    Apr 17 at 20:31







  • 5




    Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
    – vnp
    Apr 17 at 21:00






  • 3




    @vnp: I am betting that sorting is the quickest for short strings.
    – Martin York
    Apr 17 at 23:26










  • @JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
    – Martin York
    Apr 17 at 23:30















Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
– John DeBord
Apr 17 at 20:30





Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
– John DeBord
Apr 17 at 20:30













@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
– Jerry Coffin
Apr 17 at 20:31





@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a multimap instead of unordered_multimap.
– Jerry Coffin
Apr 17 at 20:31





5




5




Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
– vnp
Apr 17 at 21:00




Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
– vnp
Apr 17 at 21:00




3




3




@vnp: I am betting that sorting is the quickest for short strings.
– Martin York
Apr 17 at 23:26




@vnp: I am betting that sorting is the quickest for short strings.
– Martin York
Apr 17 at 23:26












@JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
– Martin York
Apr 17 at 23:30




@JerryCoffin Surprised with array results. Did you just replace the unordered_map with C-Array?
– Martin York
Apr 17 at 23:30












up vote
15
down vote













I would say yes, use an array.



Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.



You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.



As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.



bool is_permutation(const std::string& s1, const std::string& s2)

if(s1.length() != s2.length())
return false;

// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places

for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;

for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;


for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;


return true;



Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.



One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.






share|improve this answer



















  • 3




    Note that 1 << CHAR_BIT is not guaranteed to be 256.
    – wchargin
    Apr 18 at 7:48










  • I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
    – Matthieu M.
    Apr 18 at 13:12






  • 1




    For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
    – Matthieu M.
    Apr 18 at 13:18














up vote
15
down vote













I would say yes, use an array.



Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.



You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.



As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.



bool is_permutation(const std::string& s1, const std::string& s2)

if(s1.length() != s2.length())
return false;

// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places

for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;

for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;


for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;


return true;



Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.



One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.






share|improve this answer



















  • 3




    Note that 1 << CHAR_BIT is not guaranteed to be 256.
    – wchargin
    Apr 18 at 7:48










  • I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
    – Matthieu M.
    Apr 18 at 13:12






  • 1




    For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
    – Matthieu M.
    Apr 18 at 13:18












up vote
15
down vote










up vote
15
down vote









I would say yes, use an array.



Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.



You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.



As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.



bool is_permutation(const std::string& s1, const std::string& s2)

if(s1.length() != s2.length())
return false;

// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places

for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;

for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;


for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;


return true;



Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.



One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.






share|improve this answer















I would say yes, use an array.



Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.



You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.



As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.



bool is_permutation(const std::string& s1, const std::string& s2)

if(s1.length() != s2.length())
return false;

// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places

for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;

for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;


for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;


return true;



Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.



One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.







share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 18 at 22:04


























answered Apr 17 at 23:24









Josiah

3,182326




3,182326







  • 3




    Note that 1 << CHAR_BIT is not guaranteed to be 256.
    – wchargin
    Apr 18 at 7:48










  • I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
    – Matthieu M.
    Apr 18 at 13:12






  • 1




    For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
    – Matthieu M.
    Apr 18 at 13:18












  • 3




    Note that 1 << CHAR_BIT is not guaranteed to be 256.
    – wchargin
    Apr 18 at 7:48










  • I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
    – Matthieu M.
    Apr 18 at 13:12






  • 1




    For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
    – Matthieu M.
    Apr 18 at 13:18







3




3




Note that 1 << CHAR_BIT is not guaranteed to be 256.
– wchargin
Apr 18 at 7:48




Note that 1 << CHAR_BIT is not guaranteed to be 256.
– wchargin
Apr 18 at 7:48












I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
– Matthieu M.
Apr 18 at 13:12




I would recommend using something else than long long int: either assert that the strings are short-ish (< 2GB each) and use std::int32_t or aim for space and use std::int64_t. The latter, of course, incurs a significant memory overhead.
– Matthieu M.
Apr 18 at 13:12




1




1




For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
– Matthieu M.
Apr 18 at 13:18




For those following at home; the assembly of the long long int and int versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int versions requires only half as many iterations.
– Matthieu M.
Apr 18 at 13:18










up vote
6
down vote













You have got the basics.

Though @Jerry Coffin points out some good points.



I would take this one step further.

A std::unordered_multimap is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char. Very simple and a very small number of them (even if you use all 256).



So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.



// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)

if (a.length() != b.length())
return false;

std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



Notes:



  • Complexity O(n).

  • Overhead per iteration very small.

  • Overhead high for initialization.





share|improve this answer



















  • 2




    What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
    – Toby Speight
    Apr 18 at 9:27










  • Shouldn't that be char or preferable auto for c?
    – Deduplicator
    Apr 18 at 16:21














up vote
6
down vote













You have got the basics.

Though @Jerry Coffin points out some good points.



I would take this one step further.

A std::unordered_multimap is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char. Very simple and a very small number of them (even if you use all 256).



So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.



// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)

if (a.length() != b.length())
return false;

std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



Notes:



  • Complexity O(n).

  • Overhead per iteration very small.

  • Overhead high for initialization.





share|improve this answer



















  • 2




    What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
    – Toby Speight
    Apr 18 at 9:27










  • Shouldn't that be char or preferable auto for c?
    – Deduplicator
    Apr 18 at 16:21












up vote
6
down vote










up vote
6
down vote









You have got the basics.

Though @Jerry Coffin points out some good points.



I would take this one step further.

A std::unordered_multimap is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char. Very simple and a very small number of them (even if you use all 256).



So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.



// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)

if (a.length() != b.length())
return false;

std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



Notes:



  • Complexity O(n).

  • Overhead per iteration very small.

  • Overhead high for initialization.





share|improve this answer















You have got the basics.

Though @Jerry Coffin points out some good points.



I would take this one step further.

A std::unordered_multimap is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char. Very simple and a very small number of them (even if you use all 256).



So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.



// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)

if (a.length() != b.length())
return false;

std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;

for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;



Notes:



  • Complexity O(n).

  • Overhead per iteration very small.

  • Overhead high for initialization.






share|improve this answer















share|improve this answer



share|improve this answer








edited Apr 18 at 16:58


























answered Apr 17 at 23:23









Martin York

70.8k481244




70.8k481244







  • 2




    What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
    – Toby Speight
    Apr 18 at 9:27










  • Shouldn't that be char or preferable auto for c?
    – Deduplicator
    Apr 18 at 16:21












  • 2




    What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
    – Toby Speight
    Apr 18 at 9:27










  • Shouldn't that be char or preferable auto for c?
    – Deduplicator
    Apr 18 at 16:21







2




2




What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
– Toby Speight
Apr 18 at 9:27




What's the significance of casting unsigned char to unsigned char? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX rather than 256,
– Toby Speight
Apr 18 at 9:27












Shouldn't that be char or preferable auto for c?
– Deduplicator
Apr 18 at 16:21




Shouldn't that be char or preferable auto for c?
– Deduplicator
Apr 18 at 16:21












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192327%2fdetermine-if-one-string-is-a-permutation-of-the-another%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation