Determine if one string is a permutation of the another
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
12
down vote
favorite
Can anyone give me some feedback if I'm using std::unordered_multimap
optimally? I don't know the ins-and-outs of the std::unordered_multimap
implementation.
I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?
bool is_permutation(const std::string& s1, const std::string& s2)
assert(s1.size() && s2.size());
if(s1.length() != s2.length())
return false;
std::unordered_multimap<int, char> m1, m2;
for(int i = 0; i < s1.length(); ++i)
m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);
if(m1 == m2)
return true;
return false;
c++ combinatorics hash-map
add a comment |Â
up vote
12
down vote
favorite
Can anyone give me some feedback if I'm using std::unordered_multimap
optimally? I don't know the ins-and-outs of the std::unordered_multimap
implementation.
I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?
bool is_permutation(const std::string& s1, const std::string& s2)
assert(s1.size() && s2.size());
if(s1.length() != s2.length())
return false;
std::unordered_multimap<int, char> m1, m2;
for(int i = 0; i < s1.length(); ++i)
m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);
if(m1 == m2)
return true;
return false;
c++ combinatorics hash-map
Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
â Pedro A
Apr 18 at 0:05
You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
â JDà Âugosz
Apr 18 at 6:14
4
Sort the characters in both strings and compare them?
â Thorbjørn Ravn Andersen
Apr 18 at 8:56
Is there a good reason not to simplyreturn std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());
? Or should this be tagged reinventing-the-wheel?
â Toby Speight
Apr 23 at 19:14
add a comment |Â
up vote
12
down vote
favorite
up vote
12
down vote
favorite
Can anyone give me some feedback if I'm using std::unordered_multimap
optimally? I don't know the ins-and-outs of the std::unordered_multimap
implementation.
I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?
bool is_permutation(const std::string& s1, const std::string& s2)
assert(s1.size() && s2.size());
if(s1.length() != s2.length())
return false;
std::unordered_multimap<int, char> m1, m2;
for(int i = 0; i < s1.length(); ++i)
m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);
if(m1 == m2)
return true;
return false;
c++ combinatorics hash-map
Can anyone give me some feedback if I'm using std::unordered_multimap
optimally? I don't know the ins-and-outs of the std::unordered_multimap
implementation.
I'm wondering if comparing two maps iterates through each of the elements in the map, making the algorithm $O(N^2)$ or if it just compares the hash of the map or something making the algorithm $O(N)$. Or would it just be better to use an array if I'm given this question in an interview?
bool is_permutation(const std::string& s1, const std::string& s2)
assert(s1.size() && s2.size());
if(s1.length() != s2.length())
return false;
std::unordered_multimap<int, char> m1, m2;
for(int i = 0; i < s1.length(); ++i)
m1.insert(s1[i], s1[i]);
m2.insert(s2[i], s2[i]);
if(m1 == m2)
return true;
return false;
c++ combinatorics hash-map
edited Apr 18 at 23:38
Jamalâ¦
30.1k11114225
30.1k11114225
asked Apr 17 at 20:20
John DeBord
634
634
Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
â Pedro A
Apr 18 at 0:05
You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
â JDà Âugosz
Apr 18 at 6:14
4
Sort the characters in both strings and compare them?
â Thorbjørn Ravn Andersen
Apr 18 at 8:56
Is there a good reason not to simplyreturn std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());
? Or should this be tagged reinventing-the-wheel?
â Toby Speight
Apr 23 at 19:14
add a comment |Â
Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
â Pedro A
Apr 18 at 0:05
You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
â JDà Âugosz
Apr 18 at 6:14
4
Sort the characters in both strings and compare them?
â Thorbjørn Ravn Andersen
Apr 18 at 8:56
Is there a good reason not to simplyreturn std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());
? Or should this be tagged reinventing-the-wheel?
â Toby Speight
Apr 23 at 19:14
Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
â Pedro A
Apr 18 at 0:05
Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
â Pedro A
Apr 18 at 0:05
You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
â JDà Âugosz
Apr 18 at 6:14
You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
â JDà Âugosz
Apr 18 at 6:14
4
4
Sort the characters in both strings and compare them?
â Thorbjørn Ravn Andersen
Apr 18 at 8:56
Sort the characters in both strings and compare them?
â Thorbjørn Ravn Andersen
Apr 18 at 8:56
Is there a good reason not to simply
return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());
? Or should this be tagged reinventing-the-wheel?â Toby Speight
Apr 23 at 19:14
Is there a good reason not to simply
return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());
? Or should this be tagged reinventing-the-wheel?â Toby Speight
Apr 23 at 19:14
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
14
down vote
accepted
Comparing two maps is linear, so that's not a big problem.
I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset
, or you could use a unordered_map<char, size_t>
. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.
Personally, I think it's generally simpler to just sort the two strings, then compare them:
bool is_permutation(std::string s1, std::string s2)
if (s1.length() != s2.length())
return false;
std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;
A version making good use of an unordered_map
:
bool is_perm2(std::string const &a, std::string const &b)
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.
I have not, however, gotten nearly as good of results from using a multiset
or unordered_multiset
. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap
ends up similarly (or probably even worse).
For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:
[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]
So this shows the unordered_map
having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.
Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map
for larger strings.
Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.
There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.
Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).
Edit 2:
After more testing, I have to agree with the consensus that counting characters in an array is a better choice:
bool is_perm3(std::string const &a, std::string const &b)
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;
I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector
vs. std::array
in his test. We'd expect array
to have less initial overhead, but the vector
was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using amultimap
instead ofunordered_multimap
.
â Jerry Coffin
Apr 17 at 20:31
5
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
3
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@JerryCoffin Surprised with array results. Did you just replace theunordered_map
withC-Array
?
â Martin York
Apr 17 at 23:30
 |Â
show 7 more comments
up vote
15
down vote
I would say yes, use an array.
Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.
You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.
As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.
bool is_permutation(const std::string& s1, const std::string& s2)
if(s1.length() != s2.length())
return false;
// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places
for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;
for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;
for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;
return true;
Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.
One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.
3
Note that1 << CHAR_BIT
is not guaranteed to be 256.
â wchargin
Apr 18 at 7:48
I would recommend using something else thanlong long int
: either assert that the strings are short-ish (< 2GB each) and usestd::int32_t
or aim for space and usestd::int64_t
. The latter, of course, incurs a significant memory overhead.
â Matthieu M.
Apr 18 at 13:12
1
For those following at home; the assembly of thelong long int
andint
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) theint
versions requires only half as many iterations.
â Matthieu M.
Apr 18 at 13:18
add a comment |Â
up vote
6
down vote
You have got the basics.
Though @Jerry Coffin points out some good points.
I would take this one step further.
A std::unordered_multimap
is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char
. Very simple and a very small number of them (even if you use all 256).
So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.
// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)
if (a.length() != b.length())
return false;
std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
Notes:
- Complexity O(n).
- Overhead per iteration very small.
- Overhead high for initialization.
2
What's the significance of castingunsigned char
tounsigned char
? Isn't that just an identity? For real portability, I'd write1 + UCHAR_MAX
rather than256
,
â Toby Speight
Apr 18 at 9:27
Shouldn't that bechar
or preferableauto
forc
?
â Deduplicator
Apr 18 at 16:21
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
14
down vote
accepted
Comparing two maps is linear, so that's not a big problem.
I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset
, or you could use a unordered_map<char, size_t>
. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.
Personally, I think it's generally simpler to just sort the two strings, then compare them:
bool is_permutation(std::string s1, std::string s2)
if (s1.length() != s2.length())
return false;
std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;
A version making good use of an unordered_map
:
bool is_perm2(std::string const &a, std::string const &b)
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.
I have not, however, gotten nearly as good of results from using a multiset
or unordered_multiset
. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap
ends up similarly (or probably even worse).
For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:
[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]
So this shows the unordered_map
having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.
Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map
for larger strings.
Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.
There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.
Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).
Edit 2:
After more testing, I have to agree with the consensus that counting characters in an array is a better choice:
bool is_perm3(std::string const &a, std::string const &b)
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;
I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector
vs. std::array
in his test. We'd expect array
to have less initial overhead, but the vector
was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using amultimap
instead ofunordered_multimap
.
â Jerry Coffin
Apr 17 at 20:31
5
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
3
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@JerryCoffin Surprised with array results. Did you just replace theunordered_map
withC-Array
?
â Martin York
Apr 17 at 23:30
 |Â
show 7 more comments
up vote
14
down vote
accepted
Comparing two maps is linear, so that's not a big problem.
I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset
, or you could use a unordered_map<char, size_t>
. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.
Personally, I think it's generally simpler to just sort the two strings, then compare them:
bool is_permutation(std::string s1, std::string s2)
if (s1.length() != s2.length())
return false;
std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;
A version making good use of an unordered_map
:
bool is_perm2(std::string const &a, std::string const &b)
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.
I have not, however, gotten nearly as good of results from using a multiset
or unordered_multiset
. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap
ends up similarly (or probably even worse).
For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:
[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]
So this shows the unordered_map
having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.
Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map
for larger strings.
Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.
There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.
Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).
Edit 2:
After more testing, I have to agree with the consensus that counting characters in an array is a better choice:
bool is_perm3(std::string const &a, std::string const &b)
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;
I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector
vs. std::array
in his test. We'd expect array
to have less initial overhead, but the vector
was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using amultimap
instead ofunordered_multimap
.
â Jerry Coffin
Apr 17 at 20:31
5
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
3
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@JerryCoffin Surprised with array results. Did you just replace theunordered_map
withC-Array
?
â Martin York
Apr 17 at 23:30
 |Â
show 7 more comments
up vote
14
down vote
accepted
up vote
14
down vote
accepted
Comparing two maps is linear, so that's not a big problem.
I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset
, or you could use a unordered_map<char, size_t>
. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.
Personally, I think it's generally simpler to just sort the two strings, then compare them:
bool is_permutation(std::string s1, std::string s2)
if (s1.length() != s2.length())
return false;
std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;
A version making good use of an unordered_map
:
bool is_perm2(std::string const &a, std::string const &b)
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.
I have not, however, gotten nearly as good of results from using a multiset
or unordered_multiset
. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap
ends up similarly (or probably even worse).
For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:
[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]
So this shows the unordered_map
having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.
Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map
for larger strings.
Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.
There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.
Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).
Edit 2:
After more testing, I have to agree with the consensus that counting characters in an array is a better choice:
bool is_perm3(std::string const &a, std::string const &b)
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;
I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector
vs. std::array
in his test. We'd expect array
to have less initial overhead, but the vector
was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).
Comparing two maps is linear, so that's not a big problem.
I don't think you've made particularly good use of maps though. As I see it, you have two choices. You could use an unordered_multiset
, or you could use a unordered_map<char, size_t>
. In the latter case, you'd keep a count of each character in the string, and increment the count for each character as you scan through the input strings.
Personally, I think it's generally simpler to just sort the two strings, then compare them:
bool is_permutation(std::string s1, std::string s2)
if (s1.length() != s2.length())
return false;
std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;
A version making good use of an unordered_map
:
bool is_perm2(std::string const &a, std::string const &b)
std::unordered_map<char, std::size_t> s1;
std::unordered_map<char, std::size_t> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
...will gain a substantial advantage if the strings involved are long. Don't expect miracles though--on typical strings (a few dozen to a few hundred characters) overhead often matters more than computational complexity, so this is unlikely to accomplish much. On the other hand, if you honestly expect to process strings of (say) a megabyte or more on a fairly regular basis, you can expect it to run significantly faster than the others.
I have not, however, gotten nearly as good of results from using a multiset
or unordered_multiset
. In fact, both are quite slow in my testing. I suspect the problem is that for this case, they're both storing a lot of data (a node for each character in the input), so before the string is large enough for computational complexity to matter much, they're overflowing the cache, so you end up with a lot of references to main memory. Note that when used as in the question, std::unordered_multimap
ends up similarly (or probably even worse).
For anybody who might care, here's a chart of how various approaches work out for different sizes of strings:
[This shows the log of the time on the vertical axis, to help keep things visible across a wide range of sizes (and with them, times).]
So this shows the unordered_map
having pretty high overhead, but nice, slow growth, so it's the slowest for small strings, but the fastest for strings over 8K in size.
Sorting is somewhat similar--faster for strings up to a kilobyte, but slower than the unordered_map
for larger strings.
Using a raw array for the counts has behavior similar to sort, but even more dramatic--very fast for small strings, but slows down quite a lot for larger strings.
There seems to be no size at which the set/unordered_set (or unordered_multimap as used in the question) be the optimal choice.
Oh, for anybody who wonders about the odd stair-step look: I generated two tests for strings of each length: one with the strings identical, one with one character changed between them (i.e., one that should return true and another than should return false). I suppose if I wanted to be really complete, I should also add a test case for strings that are shuffled versions of each other, so they are unequal, but should still return true, but I haven't bothered with that (I have tested to be reasonably certain all the charted algorithms actually work though).
Edit 2:
After more testing, I have to agree with the consensus that counting characters in an array is a better choice:
bool is_perm3(std::string const &a, std::string const &b)
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s1;
std::array<std::size_t, std::numeric_limits<unsigned char>::max()+1> s2;
if (a.length() != b.length())
return false;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
for (std::size_t i=0; i<s1.size(); i++)
if (s1[i] != s2[i])
return false;
return true;
I remain puzzled (and frankly, rather bothered) but the poor performance of std::vector
vs. std::array
in his test. We'd expect array
to have less initial overhead, but the vector
was pre-allocated, so speed differences after being set up should be quite minimal (but in this case definitely are not).
edited Apr 18 at 16:40
answered Apr 17 at 20:26
Jerry Coffin
27.4k360123
27.4k360123
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using amultimap
instead ofunordered_multimap
.
â Jerry Coffin
Apr 17 at 20:31
5
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
3
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@JerryCoffin Surprised with array results. Did you just replace theunordered_map
withC-Array
?
â Martin York
Apr 17 at 23:30
 |Â
show 7 more comments
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using amultimap
instead ofunordered_multimap
.
â Jerry Coffin
Apr 17 at 20:31
5
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
3
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@JerryCoffin Surprised with array results. Did you just replace theunordered_map
withC-Array
?
â Martin York
Apr 17 at 23:30
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
Can you explain to me why it is O(NlogN)? I don't understand how my solution is O(NlogN) as well as yours?
â John DeBord
Apr 17 at 20:30
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a
multimap
instead of unordered_multimap
.â Jerry Coffin
Apr 17 at 20:31
@JohnDeBord: Sorry--I hadn't read carefully enough (or maybe just my brain slipped a cog), and thought you were using a
multimap
instead of unordered_multimap
.â Jerry Coffin
Apr 17 at 20:31
5
5
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
Sorting seems like an overkill. Count letters and compare histograms in $O(N)$.
â vnp
Apr 17 at 21:00
3
3
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@vnp: I am betting that sorting is the quickest for short strings.
â Martin York
Apr 17 at 23:26
@JerryCoffin Surprised with array results. Did you just replace the
unordered_map
with C-Array
?â Martin York
Apr 17 at 23:30
@JerryCoffin Surprised with array results. Did you just replace the
unordered_map
with C-Array
?â Martin York
Apr 17 at 23:30
 |Â
show 7 more comments
up vote
15
down vote
I would say yes, use an array.
Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.
You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.
As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.
bool is_permutation(const std::string& s1, const std::string& s2)
if(s1.length() != s2.length())
return false;
// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places
for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;
for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;
for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;
return true;
Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.
One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.
3
Note that1 << CHAR_BIT
is not guaranteed to be 256.
â wchargin
Apr 18 at 7:48
I would recommend using something else thanlong long int
: either assert that the strings are short-ish (< 2GB each) and usestd::int32_t
or aim for space and usestd::int64_t
. The latter, of course, incurs a significant memory overhead.
â Matthieu M.
Apr 18 at 13:12
1
For those following at home; the assembly of thelong long int
andint
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) theint
versions requires only half as many iterations.
â Matthieu M.
Apr 18 at 13:18
add a comment |Â
up vote
15
down vote
I would say yes, use an array.
Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.
You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.
As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.
bool is_permutation(const std::string& s1, const std::string& s2)
if(s1.length() != s2.length())
return false;
// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places
for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;
for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;
for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;
return true;
Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.
One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.
3
Note that1 << CHAR_BIT
is not guaranteed to be 256.
â wchargin
Apr 18 at 7:48
I would recommend using something else thanlong long int
: either assert that the strings are short-ish (< 2GB each) and usestd::int32_t
or aim for space and usestd::int64_t
. The latter, of course, incurs a significant memory overhead.
â Matthieu M.
Apr 18 at 13:12
1
For those following at home; the assembly of thelong long int
andint
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) theint
versions requires only half as many iterations.
â Matthieu M.
Apr 18 at 13:18
add a comment |Â
up vote
15
down vote
up vote
15
down vote
I would say yes, use an array.
Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.
You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.
As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.
bool is_permutation(const std::string& s1, const std::string& s2)
if(s1.length() != s2.length())
return false;
// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places
for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;
for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;
for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;
return true;
Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.
One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.
I would say yes, use an array.
Arrays afford very good memory locality, which is good for cache performance. They essentially are the most basic and efficient hash map if the key is the index.
You only need one array with 256 elements: first count up the number of times each character appears in the first string, and then count down for the number of times each character appears in the second string. Precisely if the two strings are permutations, the array will finish with all zeroes.
As a further hint at relative complexity, according to godbolt.org, your code produces 849 lines of assembly when compiled on the most recent GCC with -O3 optimisation.
The following code produces 53 lines.
bool is_permutation(const std::string& s1, const std::string& s2)
if(s1.length() != s2.length())
return false;
// There are usually 256 possible characters but on rare architectures may be more.
constexpr possible_char_count = std::numeric_limits<unsigned char>::max()+1;
std::array<size_t, possible_char_count > m1 = ; // Zero initialized all places
for(char c : s1) // Count up for each character in the first string
m1[static_cast<unsigned char>(c)]++;
for(char c : s2) // Count down for each character in the second string
m1[static_cast<unsigned char>(c)]--;
for (size_t t : m1)
// If any character appears a net nonzero number of times,
// the two strings are not anagrams.
if (t != 0)
return false;
return true;
Lines of code are not, of themselves, a perfect indicator of what is the most efficient algorithm. After all, you can implement a nice short version of bubble sort! Nevertheless when the underlying approach is the same (that of counting characters) it is a fairly good indicator that one implementation is doing unnecessary work.
One reason to prefer a map to an array is that when the input strings are very short or very homogenous, then the map is likely to be more memory efficient. This may translate into speed improvements by avoiding allocation and inialisation overhead. As always, the only reliable way to decide is to profile with your real world inputs.
edited Apr 18 at 22:04
answered Apr 17 at 23:24
Josiah
3,182326
3,182326
3
Note that1 << CHAR_BIT
is not guaranteed to be 256.
â wchargin
Apr 18 at 7:48
I would recommend using something else thanlong long int
: either assert that the strings are short-ish (< 2GB each) and usestd::int32_t
or aim for space and usestd::int64_t
. The latter, of course, incurs a significant memory overhead.
â Matthieu M.
Apr 18 at 13:12
1
For those following at home; the assembly of thelong long int
andint
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) theint
versions requires only half as many iterations.
â Matthieu M.
Apr 18 at 13:18
add a comment |Â
3
Note that1 << CHAR_BIT
is not guaranteed to be 256.
â wchargin
Apr 18 at 7:48
I would recommend using something else thanlong long int
: either assert that the strings are short-ish (< 2GB each) and usestd::int32_t
or aim for space and usestd::int64_t
. The latter, of course, incurs a significant memory overhead.
â Matthieu M.
Apr 18 at 13:12
1
For those following at home; the assembly of thelong long int
andint
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) theint
versions requires only half as many iterations.
â Matthieu M.
Apr 18 at 13:18
3
3
Note that
1 << CHAR_BIT
is not guaranteed to be 256.â wchargin
Apr 18 at 7:48
Note that
1 << CHAR_BIT
is not guaranteed to be 256.â wchargin
Apr 18 at 7:48
I would recommend using something else than
long long int
: either assert that the strings are short-ish (< 2GB each) and use std::int32_t
or aim for space and use std::int64_t
. The latter, of course, incurs a significant memory overhead.â Matthieu M.
Apr 18 at 13:12
I would recommend using something else than
long long int
: either assert that the strings are short-ish (< 2GB each) and use std::int32_t
or aim for space and use std::int64_t
. The latter, of course, incurs a significant memory overhead.â Matthieu M.
Apr 18 at 13:12
1
1
For those following at home; the assembly of the
long long int
and int
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int
versions requires only half as many iterations.â Matthieu M.
Apr 18 at 13:18
For those following at home; the assembly of the
long long int
and int
versions => godbolt.org/g/pGpHFt . As expected (1) the initialization and final checks are optimized and (2) the int
versions requires only half as many iterations.â Matthieu M.
Apr 18 at 13:18
add a comment |Â
up vote
6
down vote
You have got the basics.
Though @Jerry Coffin points out some good points.
I would take this one step further.
A std::unordered_multimap
is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char
. Very simple and a very small number of them (even if you use all 256).
So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.
// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)
if (a.length() != b.length())
return false;
std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
Notes:
- Complexity O(n).
- Overhead per iteration very small.
- Overhead high for initialization.
2
What's the significance of castingunsigned char
tounsigned char
? Isn't that just an identity? For real portability, I'd write1 + UCHAR_MAX
rather than256
,
â Toby Speight
Apr 18 at 9:27
Shouldn't that bechar
or preferableauto
forc
?
â Deduplicator
Apr 18 at 16:21
add a comment |Â
up vote
6
down vote
You have got the basics.
Though @Jerry Coffin points out some good points.
I would take this one step further.
A std::unordered_multimap
is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char
. Very simple and a very small number of them (even if you use all 256).
So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.
// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)
if (a.length() != b.length())
return false;
std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
Notes:
- Complexity O(n).
- Overhead per iteration very small.
- Overhead high for initialization.
2
What's the significance of castingunsigned char
tounsigned char
? Isn't that just an identity? For real portability, I'd write1 + UCHAR_MAX
rather than256
,
â Toby Speight
Apr 18 at 9:27
Shouldn't that bechar
or preferableauto
forc
?
â Deduplicator
Apr 18 at 16:21
add a comment |Â
up vote
6
down vote
up vote
6
down vote
You have got the basics.
Though @Jerry Coffin points out some good points.
I would take this one step further.
A std::unordered_multimap
is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char
. Very simple and a very small number of them (even if you use all 256).
So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.
// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)
if (a.length() != b.length())
return false;
std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
Notes:
- Complexity O(n).
- Overhead per iteration very small.
- Overhead high for initialization.
You have got the basics.
Though @Jerry Coffin points out some good points.
I would take this one step further.
A std::unordered_multimap
is good for the general case. Where the thing you are finding and comparing can be arbitrarily complex. But here the thing we are comparing is a char
. Very simple and a very small number of them (even if you use all 256).
So I would change to use an array of integers. One integer for each character. Though lets use a vector do easier comparisons.
// Stealing @Jerry Coffin code
bool is_perm2(std::string const &a, std::string const &b)
if (a.length() != b.length())
return false;
std::array<int, 1 + UCHAR_MAX> s1;
std::array<int, 1 + UCHAR_MAX> s2;
for (unsigned char c : a)
++s1[c];
for (unsigned char c : b)
++s2[c];
return s1 == s2;
Notes:
- Complexity O(n).
- Overhead per iteration very small.
- Overhead high for initialization.
edited Apr 18 at 16:58
answered Apr 17 at 23:23
Martin York
70.8k481244
70.8k481244
2
What's the significance of castingunsigned char
tounsigned char
? Isn't that just an identity? For real portability, I'd write1 + UCHAR_MAX
rather than256
,
â Toby Speight
Apr 18 at 9:27
Shouldn't that bechar
or preferableauto
forc
?
â Deduplicator
Apr 18 at 16:21
add a comment |Â
2
What's the significance of castingunsigned char
tounsigned char
? Isn't that just an identity? For real portability, I'd write1 + UCHAR_MAX
rather than256
,
â Toby Speight
Apr 18 at 9:27
Shouldn't that bechar
or preferableauto
forc
?
â Deduplicator
Apr 18 at 16:21
2
2
What's the significance of casting
unsigned char
to unsigned char
? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX
rather than 256
,â Toby Speight
Apr 18 at 9:27
What's the significance of casting
unsigned char
to unsigned char
? Isn't that just an identity? For real portability, I'd write 1 + UCHAR_MAX
rather than 256
,â Toby Speight
Apr 18 at 9:27
Shouldn't that be
char
or preferable auto
for c
?â Deduplicator
Apr 18 at 16:21
Shouldn't that be
char
or preferable auto
for c
?â Deduplicator
Apr 18 at 16:21
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192327%2fdetermine-if-one-string-is-a-permutation-of-the-another%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Fun fact: this problem (of determining if a string is a permutation of another) is the classic example of a problem that is inherently faster in imperative languages such as C++ than declarative languages such as SML where lists are defined recursively as head and tail: O(n) in C++ while the best possible in SML is O(n log n).
â Pedro A
Apr 18 at 0:05
You are assuming that the string contains single-byte characters only. It can get false positives with UTF-8 text.
â JDà Âugosz
Apr 18 at 6:14
4
Sort the characters in both strings and compare them?
â Thorbjørn Ravn Andersen
Apr 18 at 8:56
Is there a good reason not to simply
return std::is_permutation(s1.begin(), s1.end(), s2.begin(), s2.end());
? Or should this be tagged reinventing-the-wheel?â Toby Speight
Apr 23 at 19:14