Retrieve words from dictionary when they meet letter requirements

 Clash Royale CLAN TAG#URR8PPP
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
 foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
 
 $step = 0;
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 sort($strSplit);
 if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit)) 
 //echo "All specified letters from $letters are in $word 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api and pia.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
 $step = 0;
 mb_internal_encoding("UTF-8");
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $wordSplit = array_filter($wordSplit, function($x) use (&$strSplit) 
 if (in_array(strtolower($x), array_map('strtolower', $strSplit), true)) 
 $pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
 unset($strSplit[$pos]);
 return false;
 
 return true;
 );
 if (count(array_diff($strSplit,$wordSplit)) === 0) 
 //echo "$word contains all letters of $letters 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample and apple.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
 |Â
show 5 more comments
up vote
2
down vote
favorite
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
 foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
 
 $step = 0;
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 sort($strSplit);
 if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit)) 
 //echo "All specified letters from $letters are in $word 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api and pia.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
 $step = 0;
 mb_internal_encoding("UTF-8");
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $wordSplit = array_filter($wordSplit, function($x) use (&$strSplit) 
 if (in_array(strtolower($x), array_map('strtolower', $strSplit), true)) 
 $pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
 unset($strSplit[$pos]);
 return false;
 
 return true;
 );
 if (count(array_diff($strSplit,$wordSplit)) === 0) 
 //echo "$word contains all letters of $letters 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample and apple.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
 
 
 
 
 
 
 Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
 â juvian
 May 24 at 16:41
 
 
 
 
 
 
 
 
 
 Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
 â Otabek
 May 25 at 13:04
 
 
 
 
 
 
 
 
 
 You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
 â juvian
 May 25 at 19:32
 
 
 
 
 
 
 
 
 
 How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
 â Otabek
 May 27 at 6:45
 
 
 
 
 
 
 
 
 
 Sorry I dont know php, can write pseudocode at best
 â juvian
 May 27 at 6:47
 
 
 
 |Â
show 5 more comments
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
 foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
 
 $step = 0;
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 sort($strSplit);
 if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit)) 
 //echo "All specified letters from $letters are in $word 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api and pia.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
 $step = 0;
 mb_internal_encoding("UTF-8");
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $wordSplit = array_filter($wordSplit, function($x) use (&$strSplit) 
 if (in_array(strtolower($x), array_map('strtolower', $strSplit), true)) 
 $pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
 unset($strSplit[$pos]);
 return false;
 
 return true;
 );
 if (count(array_diff($strSplit,$wordSplit)) === 0) 
 //echo "$word contains all letters of $letters 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample and apple.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
 foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
 
 $step = 0;
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 sort($strSplit);
 if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit)) 
 //echo "All specified letters from $letters are in $word 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api and pia.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
 $step = 0;
 mb_internal_encoding("UTF-8");
 $result = ;
 foreach ($dictionary as $word) 
 $step++;
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $wordSplit = array_filter($wordSplit, function($x) use (&$strSplit) 
 if (in_array(strtolower($x), array_map('strtolower', $strSplit), true)) 
 $pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
 unset($strSplit[$pos]);
 return false;
 
 return true;
 );
 if (count(array_diff($strSplit,$wordSplit)) === 0) 
 //echo "$word contains all letters of $letters 
 
 return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample and apple.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
edited Jun 10 at 4:34
asked May 24 at 9:16
Otabek
169110
169110
 
 
 
 
 
 
 Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
 â juvian
 May 24 at 16:41
 
 
 
 
 
 
 
 
 
 Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
 â Otabek
 May 25 at 13:04
 
 
 
 
 
 
 
 
 
 You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
 â juvian
 May 25 at 19:32
 
 
 
 
 
 
 
 
 
 How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
 â Otabek
 May 27 at 6:45
 
 
 
 
 
 
 
 
 
 Sorry I dont know php, can write pseudocode at best
 â juvian
 May 27 at 6:47
 
 
 
 |Â
show 5 more comments
 
 
 
 
 
 
 Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
 â juvian
 May 24 at 16:41
 
 
 
 
 
 
 
 
 
 Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
 â Otabek
 May 25 at 13:04
 
 
 
 
 
 
 
 
 
 You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
 â juvian
 May 25 at 19:32
 
 
 
 
 
 
 
 
 
 How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
 â Otabek
 May 27 at 6:45
 
 
 
 
 
 
 
 
 
 Sorry I dont know php, can write pseudocode at best
 â juvian
 May 27 at 6:47
 
 
 
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
 |Â
show 5 more comments
 3 Answers
 3
 
active
oldest
votes
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
 $lettersLength = mb_strlen($letters, $encoding); // call just once and cache
 $lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($lettersSplit);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word, $encoding) == $lettersLength) 
 $wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if ($wordSplit === $lettersSplit) 
 $result = $word;
 
 
 
 return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
 
 
 
 
 
 
 According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
 â Otabek
 Jun 7 at 6:50
 
 
 
 
 
 
 
 
 
 That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond- word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
 â mickmackusa
 Jun 7 at 6:52
 
 
 
 
 
 
 
 
 
 
 @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
 â mickmackusa
 Jun 8 at 1:04
 
 
 
 
 
 
 
 
 
 Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
 â Otabek
 Jun 9 at 9:44
 
 
 
 
 
 1
 
 
 
 
 All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
 â mickmackusa
 Jun 9 at 13:05
 
 
 
 |Â
show 8 more comments
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary.
foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach():
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters processing.
Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).
(also drop useless code)
It appears that $step was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $strSplitLower = array_map('mb_strtolower', $strSplit);
 sort($strSplitLower);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word) <= mb_strlen($letters)) 
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if (array_map('mb_strtolower', $wordSplit) === $strSplitLower) 
 $result = $word;
 
 
 
 return $result;
From this you might derive some improvements for your second function.
 
 
 
 
 
 
 You tested your edit function code? @cFreed
 â Otabek
 May 24 at 12:57
 
 
 
 
 
 
 
 
 
 This part of code return false:- if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
 â Otabek
 May 24 at 13:01
 
 
 
 
 
 
 
 
 
 @Otabek Yes, I tested, using your- aipsample. And it works for me.
 â cFreed
 May 24 at 13:04
 
 
 
 
 
 
 
 
 
 Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
 â Otabek
 May 24 at 13:12
 
 
 
 
 
 
 
 
 
 @Otabek 3v4l.org/tBnY6.
 â cFreed
 May 24 at 13:21
 
 
 
 |Â
show 4 more comments
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
 words = dictionary
 letters = ['a'..'z']
 wordDataList = 
 for each word in words:
 wordData = new wordData()
 wordData.word = word;
 wordData.num = process(word)
 wordDataList.add(wordData)
 function process(word):
 num = 0
 for idx = 0 to letters.size():
 if letters[idx] in word:
 num = num + (1 << idx)
 return num 
2) Queries:
function query(letters, allowOtherLetters):
 matching = 
 num = process(letters)
 for wordData in wordDataList:
 if (allowOtherLetters == false and wordData.num == num):
 matching.add(wordData.word)
 else if (allowOtherLetters and (wordData.num & num) == num):
 matching.add(wordData.word)
 return matching
 
 
 
 
 
 
 This code in python? @juvian
 â Otabek
 May 29 at 2:10
 
 
 
 
 
 
 
 
 
 This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
 â Otabek
 May 29 at 2:16
 
 
 
 
 
 
 
 
 
 @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
 â juvian
 May 29 at 4:14
 
 
 
add a comment |Â
 3 Answers
 3
 
active
oldest
votes
 3 Answers
 3
 
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
 $lettersLength = mb_strlen($letters, $encoding); // call just once and cache
 $lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($lettersSplit);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word, $encoding) == $lettersLength) 
 $wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if ($wordSplit === $lettersSplit) 
 $result = $word;
 
 
 
 return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
 
 
 
 
 
 
 According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
 â Otabek
 Jun 7 at 6:50
 
 
 
 
 
 
 
 
 
 That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond- word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
 â mickmackusa
 Jun 7 at 6:52
 
 
 
 
 
 
 
 
 
 
 @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
 â mickmackusa
 Jun 8 at 1:04
 
 
 
 
 
 
 
 
 
 Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
 â Otabek
 Jun 9 at 9:44
 
 
 
 
 
 1
 
 
 
 
 All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
 â mickmackusa
 Jun 9 at 13:05
 
 
 
 |Â
show 8 more comments
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
 $lettersLength = mb_strlen($letters, $encoding); // call just once and cache
 $lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($lettersSplit);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word, $encoding) == $lettersLength) 
 $wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if ($wordSplit === $lettersSplit) 
 $result = $word;
 
 
 
 return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
 
 
 
 
 
 
 According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
 â Otabek
 Jun 7 at 6:50
 
 
 
 
 
 
 
 
 
 That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond- word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
 â mickmackusa
 Jun 7 at 6:52
 
 
 
 
 
 
 
 
 
 
 @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
 â mickmackusa
 Jun 8 at 1:04
 
 
 
 
 
 
 
 
 
 Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
 â Otabek
 Jun 9 at 9:44
 
 
 
 
 
 1
 
 
 
 
 All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
 â mickmackusa
 Jun 9 at 13:05
 
 
 
 |Â
show 8 more comments
up vote
3
down vote
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
 $lettersLength = mb_strlen($letters, $encoding); // call just once and cache
 $lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($lettersSplit);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word, $encoding) == $lettersLength) 
 $wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if ($wordSplit === $lettersSplit) 
 $result = $word;
 
 
 
 return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
 $lettersLength = mb_strlen($letters, $encoding); // call just once and cache
 $lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($lettersSplit);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word, $encoding) == $lettersLength) 
 $wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if ($wordSplit === $lettersSplit) 
 $result = $word;
 
 
 
 return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
edited Jun 10 at 0:22
answered May 31 at 16:07


mickmackusa
790112
790112
 
 
 
 
 
 
 According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
 â Otabek
 Jun 7 at 6:50
 
 
 
 
 
 
 
 
 
 That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond- word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
 â mickmackusa
 Jun 7 at 6:52
 
 
 
 
 
 
 
 
 
 
 @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
 â mickmackusa
 Jun 8 at 1:04
 
 
 
 
 
 
 
 
 
 Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
 â Otabek
 Jun 9 at 9:44
 
 
 
 
 
 1
 
 
 
 
 All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
 â mickmackusa
 Jun 9 at 13:05
 
 
 
 |Â
show 8 more comments
 
 
 
 
 
 
 According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
 â Otabek
 Jun 7 at 6:50
 
 
 
 
 
 
 
 
 
 That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond- word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
 â mickmackusa
 Jun 7 at 6:52
 
 
 
 
 
 
 
 
 
 
 @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
 â mickmackusa
 Jun 8 at 1:04
 
 
 
 
 
 
 
 
 
 Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
 â Otabek
 Jun 9 at 9:44
 
 
 
 
 
 1
 
 
 
 
 All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
 â mickmackusa
 Jun 9 at 13:05
 
 
 
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond
word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.â mickmackusa
Jun 7 at 6:52
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond
word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
 |Â
show 8 more comments
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary.
foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach():
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters processing.
Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).
(also drop useless code)
It appears that $step was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $strSplitLower = array_map('mb_strtolower', $strSplit);
 sort($strSplitLower);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word) <= mb_strlen($letters)) 
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if (array_map('mb_strtolower', $wordSplit) === $strSplitLower) 
 $result = $word;
 
 
 
 return $result;
From this you might derive some improvements for your second function.
 
 
 
 
 
 
 You tested your edit function code? @cFreed
 â Otabek
 May 24 at 12:57
 
 
 
 
 
 
 
 
 
 This part of code return false:- if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
 â Otabek
 May 24 at 13:01
 
 
 
 
 
 
 
 
 
 @Otabek Yes, I tested, using your- aipsample. And it works for me.
 â cFreed
 May 24 at 13:04
 
 
 
 
 
 
 
 
 
 Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
 â Otabek
 May 24 at 13:12
 
 
 
 
 
 
 
 
 
 @Otabek 3v4l.org/tBnY6.
 â cFreed
 May 24 at 13:21
 
 
 
 |Â
show 4 more comments
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary.
foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach():
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters processing.
Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).
(also drop useless code)
It appears that $step was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $strSplitLower = array_map('mb_strtolower', $strSplit);
 sort($strSplitLower);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word) <= mb_strlen($letters)) 
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if (array_map('mb_strtolower', $wordSplit) === $strSplitLower) 
 $result = $word;
 
 
 
 return $result;
From this you might derive some improvements for your second function.
 
 
 
 
 
 
 You tested your edit function code? @cFreed
 â Otabek
 May 24 at 12:57
 
 
 
 
 
 
 
 
 
 This part of code return false:- if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
 â Otabek
 May 24 at 13:01
 
 
 
 
 
 
 
 
 
 @Otabek Yes, I tested, using your- aipsample. And it works for me.
 â cFreed
 May 24 at 13:04
 
 
 
 
 
 
 
 
 
 Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
 â Otabek
 May 24 at 13:12
 
 
 
 
 
 
 
 
 
 @Otabek 3v4l.org/tBnY6.
 â cFreed
 May 24 at 13:21
 
 
 
 |Â
show 4 more comments
up vote
2
down vote
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary.
foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach():
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters processing.
Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).
(also drop useless code)
It appears that $step was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $strSplitLower = array_map('mb_strtolower', $strSplit);
 sort($strSplitLower);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word) <= mb_strlen($letters)) 
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if (array_map('mb_strtolower', $wordSplit) === $strSplitLower) 
 $result = $word;
 
 
 
 return $result;
From this you might derive some improvements for your second function.
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary.
foreach ($dictionary as $key => $value) 
 if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach():
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters processing.
Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).
(also drop useless code)
It appears that $step was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
 $strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
 $strSplitLower = array_map('mb_strtolower', $strSplit);
 sort($strSplitLower);
 $result = ;
 foreach ($dictionary as $word) 
 if(mb_strlen($word) <= mb_strlen($letters)) 
 $wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
 sort($wordSplit);
 if (array_map('mb_strtolower', $wordSplit) === $strSplitLower) 
 $result = $word;
 
 
 
 return $result;
From this you might derive some improvements for your second function.
edited May 24 at 13:49
answered May 24 at 12:46


cFreed
2,438719
2,438719
 
 
 
 
 
 
 You tested your edit function code? @cFreed
 â Otabek
 May 24 at 12:57
 
 
 
 
 
 
 
 
 
 This part of code return false:- if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
 â Otabek
 May 24 at 13:01
 
 
 
 
 
 
 
 
 
 @Otabek Yes, I tested, using your- aipsample. And it works for me.
 â cFreed
 May 24 at 13:04
 
 
 
 
 
 
 
 
 
 Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
 â Otabek
 May 24 at 13:12
 
 
 
 
 
 
 
 
 
 @Otabek 3v4l.org/tBnY6.
 â cFreed
 May 24 at 13:21
 
 
 
 |Â
show 4 more comments
 
 
 
 
 
 
 You tested your edit function code? @cFreed
 â Otabek
 May 24 at 12:57
 
 
 
 
 
 
 
 
 
 This part of code return false:- if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
 â Otabek
 May 24 at 13:01
 
 
 
 
 
 
 
 
 
 @Otabek Yes, I tested, using your- aipsample. And it works for me.
 â cFreed
 May 24 at 13:04
 
 
 
 
 
 
 
 
 
 Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
 â Otabek
 May 24 at 13:12
 
 
 
 
 
 
 
 
 
 @Otabek 3v4l.org/tBnY6.
 â cFreed
 May 24 at 13:21
 
 
 
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)â Otabek
May 24 at 13:01
This part of code return false:
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using your
aip sample. And it works for me.â cFreed
May 24 at 13:04
@Otabek Yes, I tested, using your
aip sample. And it works for me.â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
 |Â
show 4 more comments
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
 words = dictionary
 letters = ['a'..'z']
 wordDataList = 
 for each word in words:
 wordData = new wordData()
 wordData.word = word;
 wordData.num = process(word)
 wordDataList.add(wordData)
 function process(word):
 num = 0
 for idx = 0 to letters.size():
 if letters[idx] in word:
 num = num + (1 << idx)
 return num 
2) Queries:
function query(letters, allowOtherLetters):
 matching = 
 num = process(letters)
 for wordData in wordDataList:
 if (allowOtherLetters == false and wordData.num == num):
 matching.add(wordData.word)
 else if (allowOtherLetters and (wordData.num & num) == num):
 matching.add(wordData.word)
 return matching
 
 
 
 
 
 
 This code in python? @juvian
 â Otabek
 May 29 at 2:10
 
 
 
 
 
 
 
 
 
 This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
 â Otabek
 May 29 at 2:16
 
 
 
 
 
 
 
 
 
 @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
 â juvian
 May 29 at 4:14
 
 
 
add a comment |Â
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
 words = dictionary
 letters = ['a'..'z']
 wordDataList = 
 for each word in words:
 wordData = new wordData()
 wordData.word = word;
 wordData.num = process(word)
 wordDataList.add(wordData)
 function process(word):
 num = 0
 for idx = 0 to letters.size():
 if letters[idx] in word:
 num = num + (1 << idx)
 return num 
2) Queries:
function query(letters, allowOtherLetters):
 matching = 
 num = process(letters)
 for wordData in wordDataList:
 if (allowOtherLetters == false and wordData.num == num):
 matching.add(wordData.word)
 else if (allowOtherLetters and (wordData.num & num) == num):
 matching.add(wordData.word)
 return matching
 
 
 
 
 
 
 This code in python? @juvian
 â Otabek
 May 29 at 2:10
 
 
 
 
 
 
 
 
 
 This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
 â Otabek
 May 29 at 2:16
 
 
 
 
 
 
 
 
 
 @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
 â juvian
 May 29 at 4:14
 
 
 
add a comment |Â
up vote
1
down vote
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
 words = dictionary
 letters = ['a'..'z']
 wordDataList = 
 for each word in words:
 wordData = new wordData()
 wordData.word = word;
 wordData.num = process(word)
 wordDataList.add(wordData)
 function process(word):
 num = 0
 for idx = 0 to letters.size():
 if letters[idx] in word:
 num = num + (1 << idx)
 return num 
2) Queries:
function query(letters, allowOtherLetters):
 matching = 
 num = process(letters)
 for wordData in wordDataList:
 if (allowOtherLetters == false and wordData.num == num):
 matching.add(wordData.word)
 else if (allowOtherLetters and (wordData.num & num) == num):
 matching.add(wordData.word)
 return matching
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
 words = dictionary
 letters = ['a'..'z']
 wordDataList = 
 for each word in words:
 wordData = new wordData()
 wordData.word = word;
 wordData.num = process(word)
 wordDataList.add(wordData)
 function process(word):
 num = 0
 for idx = 0 to letters.size():
 if letters[idx] in word:
 num = num + (1 << idx)
 return num 
2) Queries:
function query(letters, allowOtherLetters):
 matching = 
 num = process(letters)
 for wordData in wordDataList:
 if (allowOtherLetters == false and wordData.num == num):
 matching.add(wordData.word)
 else if (allowOtherLetters and (wordData.num & num) == num):
 matching.add(wordData.word)
 return matching
answered May 28 at 16:40


juvian
85838
85838
 
 
 
 
 
 
 This code in python? @juvian
 â Otabek
 May 29 at 2:10
 
 
 
 
 
 
 
 
 
 This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
 â Otabek
 May 29 at 2:16
 
 
 
 
 
 
 
 
 
 @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
 â juvian
 May 29 at 4:14
 
 
 
add a comment |Â
 
 
 
 
 
 
 This code in python? @juvian
 â Otabek
 May 29 at 2:10
 
 
 
 
 
 
 
 
 
 This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
 â Otabek
 May 29 at 2:16
 
 
 
 
 
 
 
 
 
 @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
 â juvian
 May 29 at 4:14
 
 
 
This code in python? @juvian
â Otabek
May 29 at 2:10
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195074%2fretrieve-words-from-dictionary-when-they-meet-letter-requirements%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47