Retrieve words from dictionary when they meet letter requirements
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api
and pia
.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample
.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;
return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample
and apple
.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
 |Â
show 5 more comments
up vote
2
down vote
favorite
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api
and pia
.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample
.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;
return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample
and apple
.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
 |Â
show 5 more comments
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api
and pia
.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample
.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;
return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample
and apple
.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:
function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");
This would return the words api
and pia
.
getWordsWithOnlySpecifiedLetters($dictionary, "leamps");
This one would return the word sample
.
I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).
function getWordsWithSpecifiedLetters(array $dictionary, string $letters)
$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;
return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters
return $result;
Example usage:
$dictionary = ['apple', 'sample', 'api', 'pia', 'ÃÂþ÷øú'];
getWordsWithSpecifiedLetters($dictionary, "ple");
This returns the words sample
and apple
.
I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?
You can download my dictionary from here and testing your code using dictionary words.
performance php strings array regex
edited Jun 10 at 4:34
asked May 24 at 9:16
Otabek
169110
169110
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
 |Â
show 5 more comments
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47
 |Â
show 5 more comments
3 Answers
3
active
oldest
votes
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup
column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word
column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle
, when you want to find "whole" matches, you merely search the lookup
column with the =
operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle
characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle
to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle
should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;
return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyondword
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
 |Â
show 8 more comments
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary
.
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach()
:
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters
processing.
Currently you're sorting $strSplit
at each foreach()
step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit)
.
(also drop useless code)
It appears that $step
was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;
return $result;
From this you might derive some improvements for your second function.
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using youraip
sample. And it works for me.
â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
 |Â
show 4 more comments
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
words = dictionary
letters = ['a'..'z']
wordDataList =
for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)
function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num
2) Queries:
function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)
return matching
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup
column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word
column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle
, when you want to find "whole" matches, you merely search the lookup
column with the =
operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle
characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle
to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle
should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;
return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyondword
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
 |Â
show 8 more comments
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup
column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word
column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle
, when you want to find "whole" matches, you merely search the lookup
column with the =
operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle
characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle
to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle
should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;
return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyondword
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
 |Â
show 8 more comments
up vote
3
down vote
up vote
3
down vote
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup
column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word
column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle
, when you want to find "whole" matches, you merely search the lookup
column with the =
operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle
characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle
to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle
should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;
return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
How about removing the dictionary preparation each time at a cost of increasing your dictionary width?
You could have an alphabetized lookup
column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word
column:
lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
øúþÃÂ÷ | ÃÂþ÷øú
Using your lowercase, alphabetized $needle
, when you want to find "whole" matches, you merely search the lookup
column with the =
operator.
SELECT `word` FROM `dictionary` WHERE `lookup` = 'øúþÃÂ÷'
When you want to match the $needle
characters at a minimum, you call:
SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*ø.*ú.*þ.*ÃÂ.*÷.*'
Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*ø.*ú.*þ.*ÃÂ.*÷.*~u
This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.
And definitely remember to sanitize and escape the $needle
to be offered to the query for security reasons.
Mostly I am suggesting that you sacrifice memory for speed. Only the $needle
should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.
Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?
If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:
Code:
function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;
return $result;
Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.
edited Jun 10 at 0:22
answered May 31 at 16:07
mickmackusa
790112
790112
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyondword
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
 |Â
show 8 more comments
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyondword
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
â Otabek
Jun 7 at 6:50
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond
word
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.â mickmackusa
Jun 7 at 6:52
That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond
word
) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.â mickmackusa
Jun 7 at 6:52
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
â mickmackusa
Jun 8 at 1:04
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
â Otabek
Jun 9 at 9:44
1
1
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
â mickmackusa
Jun 9 at 13:05
 |Â
show 8 more comments
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary
.
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach()
:
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters
processing.
Currently you're sorting $strSplit
at each foreach()
step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit)
.
(also drop useless code)
It appears that $step
was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;
return $result;
From this you might derive some improvements for your second function.
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using youraip
sample. And it works for me.
â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
 |Â
show 4 more comments
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary
.
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach()
:
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters
processing.
Currently you're sorting $strSplit
at each foreach()
step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit)
.
(also drop useless code)
It appears that $step
was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;
return $result;
From this you might derive some improvements for your second function.
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using youraip
sample. And it works for me.
â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
 |Â
show 4 more comments
up vote
2
down vote
up vote
2
down vote
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary
.
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach()
:
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters
processing.
Currently you're sorting $strSplit
at each foreach()
step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit)
.
(also drop useless code)
It appears that $step
was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;
return $result;
From this you might derive some improvements for your second function.
Your first function can be easily improved by two ways.
Avoid changing the contents of $dictionary
.
foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);
can be suppressed, simply inserting this test at the begin of the next foreach()
:
if(mb_strlen($word) <= mb_strlen($letters))
Don't repeat $letters
processing.
Currently you're sorting $strSplit
at each foreach()
step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit)
.
(also drop useless code)
It appears that $step
was used only for tests purpose, so you can give up.
Finally
Taking advantage of the above recommendations, the following modified script should take less time to execute:
function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;
return $result;
From this you might derive some improvements for your second function.
edited May 24 at 13:49
answered May 24 at 12:46
cFreed
2,438719
2,438719
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using youraip
sample. And it works for me.
â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
 |Â
show 4 more comments
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using youraip
sample. And it works for me.
â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
You tested your edit function code? @cFreed
â Otabek
May 24 at 12:57
This part of code return false:
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
This part of code return false:
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
â Otabek
May 24 at 13:01
@Otabek Yes, I tested, using your
aip
sample. And it works for me.â cFreed
May 24 at 13:04
@Otabek Yes, I tested, using your
aip
sample. And it works for me.â cFreed
May 24 at 13:04
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
â Otabek
May 24 at 13:12
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
@Otabek 3v4l.org/tBnY6.
â cFreed
May 24 at 13:21
 |Â
show 4 more comments
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
words = dictionary
letters = ['a'..'z']
wordDataList =
for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)
function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num
2) Queries:
function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)
return matching
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
add a comment |Â
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
words = dictionary
letters = ['a'..'z']
wordDataList =
for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)
function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num
2) Queries:
function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)
return matching
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
add a comment |Â
up vote
1
down vote
up vote
1
down vote
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
words = dictionary
letters = ['a'..'z']
wordDataList =
for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)
function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num
2) Queries:
function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)
return matching
I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:
Pseudocode:
1) Preprocessing:
words = dictionary
letters = ['a'..'z']
wordDataList =
for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)
function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num
2) Queries:
function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)
return matching
answered May 28 at 16:40
juvian
85838
85838
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
add a comment |Â
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
This code in python? @juvian
â Otabek
May 29 at 2:10
This code in python? @juvian
â Otabek
May 29 at 2:10
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
â Otabek
May 29 at 2:16
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
â juvian
May 29 at 4:14
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195074%2fretrieve-words-from-dictionary-when-they-meet-letter-requirements%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
â juvian
May 24 at 16:41
Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
â Otabek
May 25 at 13:04
You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
â juvian
May 25 at 19:32
How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
â Otabek
May 27 at 6:45
Sorry I dont know php, can write pseudocode at best
â juvian
May 27 at 6:47