Retrieve words from dictionary when they meet letter requirements

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:



function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)

foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);


$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");


This would return the words api and pia.



getWordsWithOnlySpecifiedLetters($dictionary, "leamps");


This one would return the word sample.




I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).



function getWordsWithSpecifiedLetters(array $dictionary, string $letters)

$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;

return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithSpecifiedLetters($dictionary, "ple");


This returns the words sample and apple.



I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?



You can download my dictionary from here and testing your code using dictionary words.







share|improve this question





















  • Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
    – juvian
    May 24 at 16:41










  • Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
    – Otabek
    May 25 at 13:04










  • You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
    – juvian
    May 25 at 19:32










  • How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
    – Otabek
    May 27 at 6:45










  • Sorry I dont know php, can write pseudocode at best
    – juvian
    May 27 at 6:47
















up vote
2
down vote

favorite












I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:



function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)

foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);


$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");


This would return the words api and pia.



getWordsWithOnlySpecifiedLetters($dictionary, "leamps");


This one would return the word sample.




I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).



function getWordsWithSpecifiedLetters(array $dictionary, string $letters)

$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;

return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithSpecifiedLetters($dictionary, "ple");


This returns the words sample and apple.



I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?



You can download my dictionary from here and testing your code using dictionary words.







share|improve this question





















  • Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
    – juvian
    May 24 at 16:41










  • Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
    – Otabek
    May 25 at 13:04










  • You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
    – juvian
    May 25 at 19:32










  • How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
    – Otabek
    May 27 at 6:45










  • Sorry I dont know php, can write pseudocode at best
    – juvian
    May 27 at 6:47












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:



function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)

foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);


$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");


This would return the words api and pia.



getWordsWithOnlySpecifiedLetters($dictionary, "leamps");


This one would return the word sample.




I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).



function getWordsWithSpecifiedLetters(array $dictionary, string $letters)

$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;

return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithSpecifiedLetters($dictionary, "ple");


This returns the words sample and apple.



I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?



You can download my dictionary from here and testing your code using dictionary words.







share|improve this question













I have a set of functions that retrieve words from some arbitrary dictionary based on what letters they have. For example, this function gets words that use only the specified letters:



function getWordsWithOnlySpecifiedLetters(array $dictionary, string $letters)

foreach ($dictionary as $key => $value)
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);


$step = 0;
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
sort($strSplit);
if (array_map('mb_strtolower', $wordSplit) === array_map('mb_strtolower', $strSplit))
//echo "All specified letters from $letters are in $word


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithOnlySpecifiedLetters($dictionary, "aip");


This would return the words api and pia.



getWordsWithOnlySpecifiedLetters($dictionary, "leamps");


This one would return the word sample.




I also have a function that doesn't require that they exclusively use the selected letters, but rather that they use all of the specified letters (and any other letters).



function getWordsWithSpecifiedLetters(array $dictionary, string $letters)

$step = 0;
mb_internal_encoding("UTF-8");
$result = ;
foreach ($dictionary as $word)
$step++;
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$wordSplit = array_filter($wordSplit, function($x) use (&$strSplit)
if (in_array(strtolower($x), array_map('strtolower', $strSplit), true))
$pos = array_search(strtolower($x), array_map('strtolower', $strSplit), true);
unset($strSplit[$pos]);
return false;

return true;
);
if (count(array_diff($strSplit,$wordSplit)) === 0)
//echo "$word contains all letters of $letters


return $result;



Example usage:



$dictionary = ['apple', 'sample', 'api', 'pia', 'тоҷик'];
getWordsWithSpecifiedLetters($dictionary, "ple");


This returns the words sample and apple.



I have more 90000 words in my dictionary (UTF-8). This results in a very slow program; if I'm trying to find something from the full dictionary it may take tens of thousands of loops. How can I improve the performance of these functions?



You can download my dictionary from here and testing your code using dictionary words.









share|improve this question












share|improve this question




share|improve this question








edited Jun 10 at 4:34
























asked May 24 at 9:16









Otabek

169110




169110











  • Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
    – juvian
    May 24 at 16:41










  • Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
    – Otabek
    May 25 at 13:04










  • You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
    – juvian
    May 25 at 19:32










  • How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
    – Otabek
    May 27 at 6:45










  • Sorry I dont know php, can write pseudocode at best
    – juvian
    May 27 at 6:47
















  • Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
    – juvian
    May 24 at 16:41










  • Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
    – Otabek
    May 25 at 13:04










  • You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
    – juvian
    May 25 at 19:32










  • How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
    – Otabek
    May 27 at 6:45










  • Sorry I dont know php, can write pseudocode at best
    – juvian
    May 27 at 6:47















Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
– juvian
May 24 at 16:41




Is your dictionary always the same? Are you planning to use multiple getWordsWithSpecifiedLetters calls with that dictionary?
– juvian
May 24 at 16:41












Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
– Otabek
May 25 at 13:04




Yes! I use dynamic list of words from database. Words will by updated every day. I use SQLite for db. Now I have more 90000 words in db. @juvian
– Otabek
May 25 at 13:04












You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
– juvian
May 25 at 19:32




You can preprocess your dictionary by having each word and the word unique letters sorted in a string. Then you sort your dictionary by these new unique letter words. For getWordsWithOnlySpecifiedLetters query, you can sort the letters from input and then do a binary search on your dictionary. You can obtain result for this query in O(log n + k) being k the amount of words that fit the criteria
– juvian
May 25 at 19:32












How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
– Otabek
May 27 at 6:45




How can be realized it? With PHP or in SQL? Can you show with example code? @juvian
– Otabek
May 27 at 6:45












Sorry I dont know php, can write pseudocode at best
– juvian
May 27 at 6:47




Sorry I dont know php, can write pseudocode at best
– juvian
May 27 at 6:47










3 Answers
3






active

oldest

votes

















up vote
3
down vote













How about removing the dictionary preparation each time at a cost of increasing your dictionary width?



You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:



lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
икотҷ | тоҷик


Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.



SELECT `word` FROM `dictionary` WHERE `lookup` = 'икотҷ'


When you want to match the $needle characters at a minimum, you call:



SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*и.*к.*о.*т.*ҷ.*'


Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*и.*к.*о.*т.*ҷ.*~u



This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.



And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.



Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.



Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?




If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:



Code:



function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);

$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;



return $result;



Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.






share|improve this answer























  • According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
    – Otabek
    Jun 7 at 6:50










  • That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
    – mickmackusa
    Jun 7 at 6:52











  • @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
    – mickmackusa
    Jun 8 at 1:04










  • Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
    – Otabek
    Jun 9 at 9:44






  • 1




    All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
    – mickmackusa
    Jun 9 at 13:05

















up vote
2
down vote













Your first function can be easily improved by two ways.



Avoid changing the contents of $dictionary.



foreach ($dictionary as $key => $value) 
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);



can be suppressed, simply inserting this test at the begin of the next foreach():



if(mb_strlen($word) <= mb_strlen($letters))


Don't repeat $letters processing.



Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).



(also drop useless code)



It appears that $step was used only for tests purpose, so you can give up.



Finally



Taking advantage of the above recommendations, the following modified script should take less time to execute:



function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)

$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;



return $result;



From this you might derive some improvements for your second function.






share|improve this answer























  • You tested your edit function code? @cFreed
    – Otabek
    May 24 at 12:57










  • This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
    – Otabek
    May 24 at 13:01










  • @Otabek Yes, I tested, using your aip sample. And it works for me.
    – cFreed
    May 24 at 13:04










  • Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
    – Otabek
    May 24 at 13:12










  • @Otabek 3v4l.org/tBnY6.
    – cFreed
    May 24 at 13:21

















up vote
1
down vote













I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:



Pseudocode:



1) Preprocessing:



 words = dictionary
letters = ['a'..'z']
wordDataList =

for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)

function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num


2) Queries:



function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)

return matching





share|improve this answer





















  • This code in python? @juvian
    – Otabek
    May 29 at 2:10










  • This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
    – Otabek
    May 29 at 2:16










  • @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
    – juvian
    May 29 at 4:14










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195074%2fretrieve-words-from-dictionary-when-they-meet-letter-requirements%23new-answer', 'question_page');

);

Post as a guest






























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote













How about removing the dictionary preparation each time at a cost of increasing your dictionary width?



You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:



lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
икотҷ | тоҷик


Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.



SELECT `word` FROM `dictionary` WHERE `lookup` = 'икотҷ'


When you want to match the $needle characters at a minimum, you call:



SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*и.*к.*о.*т.*ҷ.*'


Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*и.*к.*о.*т.*ҷ.*~u



This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.



And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.



Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.



Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?




If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:



Code:



function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);

$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;



return $result;



Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.






share|improve this answer























  • According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
    – Otabek
    Jun 7 at 6:50










  • That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
    – mickmackusa
    Jun 7 at 6:52











  • @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
    – mickmackusa
    Jun 8 at 1:04










  • Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
    – Otabek
    Jun 9 at 9:44






  • 1




    All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
    – mickmackusa
    Jun 9 at 13:05














up vote
3
down vote













How about removing the dictionary preparation each time at a cost of increasing your dictionary width?



You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:



lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
икотҷ | тоҷик


Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.



SELECT `word` FROM `dictionary` WHERE `lookup` = 'икотҷ'


When you want to match the $needle characters at a minimum, you call:



SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*и.*к.*о.*т.*ҷ.*'


Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*и.*к.*о.*т.*ҷ.*~u



This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.



And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.



Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.



Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?




If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:



Code:



function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);

$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;



return $result;



Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.






share|improve this answer























  • According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
    – Otabek
    Jun 7 at 6:50










  • That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
    – mickmackusa
    Jun 7 at 6:52











  • @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
    – mickmackusa
    Jun 8 at 1:04










  • Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
    – Otabek
    Jun 9 at 9:44






  • 1




    All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
    – mickmackusa
    Jun 9 at 13:05












up vote
3
down vote










up vote
3
down vote









How about removing the dictionary preparation each time at a cost of increasing your dictionary width?



You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:



lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
икотҷ | тоҷик


Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.



SELECT `word` FROM `dictionary` WHERE `lookup` = 'икотҷ'


When you want to match the $needle characters at a minimum, you call:



SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*и.*к.*о.*т.*ҷ.*'


Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*и.*к.*о.*т.*ҷ.*~u



This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.



And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.



Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.



Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?




If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:



Code:



function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);

$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;



return $result;



Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.






share|improve this answer















How about removing the dictionary preparation each time at a cost of increasing your dictionary width?



You could have an alphabetized lookup column (the rows aren't alphabetized -- each letter of each word is sorted alphabetically) and a word column:



lookup | word
-----------------
aelpp | apple
aelmps | sample
aip | api
aip | pia
икотҷ | тоҷик


Using your lowercase, alphabetized $needle, when you want to find "whole" matches, you merely search the lookup column with the = operator.



SELECT `word` FROM `dictionary` WHERE `lookup` = 'икотҷ'


When you want to match the $needle characters at a minimum, you call:



SELECT `word` FROM `dictionary` WHERE `lookup` REGEXP '.*и.*к.*о.*т.*ҷ.*'


Leveraging something like this technique: Custom REGEXP Function to be used in a SQLITE SELECT Statement with this intended usage: ~.*и.*к.*о.*т.*ҷ.*~u



This, of course, is just a theoretical suggestion -- I haven't tried to do anything like this before.



And definitely remember to sanitize and escape the $needle to be offered to the query for security reasons.



Mostly I am suggesting that you sacrifice memory for speed. Only the $needle should be modified with character sorting and strtolower actions. These processes are expected to be "already done" on words prior to being stored in the dictionary.



Here is another post of mine with the same basic logic: How to best compare these two strings for values even though they are in random order?




If altering the dictionary table structure is unattractive, this is how I would recommend searching for exact character matches in any order:



Code:



function getWordsContainingTheExactSpecifedLetters_inanyorder_nomore_noless(array $dictionary, string $letters, string $encoding = 'UTF-8')
$lettersLength = mb_strlen($letters, $encoding); // call just once and cache
$lettersSplit = preg_split('//u', mb_strtolower($letters, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($lettersSplit);

$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word, $encoding) == $lettersLength)
$wordSplit = preg_split('//u', mb_strtolower($word, $encoding), null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if ($wordSplit === $lettersSplit)
$result = $word;



return $result;



Of course, you will need to change the qualifying condition if you wish to retain larger words that merely contain the letters.







share|improve this answer















share|improve this answer



share|improve this answer








edited Jun 10 at 0:22


























answered May 31 at 16:07









mickmackusa

790112




790112











  • According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
    – Otabek
    Jun 7 at 6:50










  • That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
    – mickmackusa
    Jun 7 at 6:52











  • @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
    – mickmackusa
    Jun 8 at 1:04










  • Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
    – Otabek
    Jun 9 at 9:44






  • 1




    All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
    – mickmackusa
    Jun 9 at 13:05
















  • According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
    – Otabek
    Jun 7 at 6:50










  • That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
    – mickmackusa
    Jun 7 at 6:52











  • @Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
    – mickmackusa
    Jun 8 at 1:04










  • Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
    – Otabek
    Jun 9 at 9:44






  • 1




    All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
    – mickmackusa
    Jun 9 at 13:05















According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
– Otabek
Jun 7 at 6:50




According to your algorithm, the size of the database is doubled, but the speed of searching for words is increasing @mickmackusa
– Otabek
Jun 7 at 6:50












That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
– mickmackusa
Jun 7 at 6:52





That is the sacrifice that I am proposing, yes. Whether it is "doubling" depends on if you have other columns (beyond word) in the table. Another way to express it would be to say, "at a cost of one more column". This means doing some "heavy lifting" just once and permanently storing that data in a new column. This should allow your individual searches to be conducted with more speed.
– mickmackusa
Jun 7 at 6:52













@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
– mickmackusa
Jun 8 at 1:04




@Otabek Did you try my suggestion? I'm curious how much performance improved. Or is the idea of an expanded, purpose-built table unappealing for you?
– mickmackusa
Jun 8 at 1:04












Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
– Otabek
Jun 9 at 9:44




Honestly, I have not tried your offer yet. Of course, I'll practice your proposal and test the speed of the search and then tell you about the result. But still such a big load to the computer's memory is not suitable for me. Do you know of other search options with minimal load to the computer's memory? @mickmackusa
– Otabek
Jun 9 at 9:44




1




1




All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
– mickmackusa
Jun 9 at 13:05




All that said, my overarching advice - because performance is the priority - is to bake all of the bread before you open the bakery to customers, then you only have one job to do.
– mickmackusa
Jun 9 at 13:05












up vote
2
down vote













Your first function can be easily improved by two ways.



Avoid changing the contents of $dictionary.



foreach ($dictionary as $key => $value) 
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);



can be suppressed, simply inserting this test at the begin of the next foreach():



if(mb_strlen($word) <= mb_strlen($letters))


Don't repeat $letters processing.



Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).



(also drop useless code)



It appears that $step was used only for tests purpose, so you can give up.



Finally



Taking advantage of the above recommendations, the following modified script should take less time to execute:



function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)

$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;



return $result;



From this you might derive some improvements for your second function.






share|improve this answer























  • You tested your edit function code? @cFreed
    – Otabek
    May 24 at 12:57










  • This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
    – Otabek
    May 24 at 13:01










  • @Otabek Yes, I tested, using your aip sample. And it works for me.
    – cFreed
    May 24 at 13:04










  • Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
    – Otabek
    May 24 at 13:12










  • @Otabek 3v4l.org/tBnY6.
    – cFreed
    May 24 at 13:21














up vote
2
down vote













Your first function can be easily improved by two ways.



Avoid changing the contents of $dictionary.



foreach ($dictionary as $key => $value) 
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);



can be suppressed, simply inserting this test at the begin of the next foreach():



if(mb_strlen($word) <= mb_strlen($letters))


Don't repeat $letters processing.



Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).



(also drop useless code)



It appears that $step was used only for tests purpose, so you can give up.



Finally



Taking advantage of the above recommendations, the following modified script should take less time to execute:



function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)

$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;



return $result;



From this you might derive some improvements for your second function.






share|improve this answer























  • You tested your edit function code? @cFreed
    – Otabek
    May 24 at 12:57










  • This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
    – Otabek
    May 24 at 13:01










  • @Otabek Yes, I tested, using your aip sample. And it works for me.
    – cFreed
    May 24 at 13:04










  • Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
    – Otabek
    May 24 at 13:12










  • @Otabek 3v4l.org/tBnY6.
    – cFreed
    May 24 at 13:21












up vote
2
down vote










up vote
2
down vote









Your first function can be easily improved by two ways.



Avoid changing the contents of $dictionary.



foreach ($dictionary as $key => $value) 
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);



can be suppressed, simply inserting this test at the begin of the next foreach():



if(mb_strlen($word) <= mb_strlen($letters))


Don't repeat $letters processing.



Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).



(also drop useless code)



It appears that $step was used only for tests purpose, so you can give up.



Finally



Taking advantage of the above recommendations, the following modified script should take less time to execute:



function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)

$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;



return $result;



From this you might derive some improvements for your second function.






share|improve this answer















Your first function can be easily improved by two ways.



Avoid changing the contents of $dictionary.



foreach ($dictionary as $key => $value) 
if(mb_strlen($value) > mb_strlen($letters)) unset($dictionary[$key]);



can be suppressed, simply inserting this test at the begin of the next foreach():



if(mb_strlen($word) <= mb_strlen($letters))


Don't repeat $letters processing.



Currently you're sorting $strSplit at each foreach() step, while it can be done once for all before entering loop.
Likewise for array_map('mb_strtolower', $strSplit).



(also drop useless code)



It appears that $step was used only for tests purpose, so you can give up.



Finally



Taking advantage of the above recommendations, the following modified script should take less time to execute:



function getWordsWithOnlySpecifedLetters(array $dictionary, string $letters)

$strSplit = preg_split('//u', $letters, null, PREG_SPLIT_NO_EMPTY);
$strSplitLower = array_map('mb_strtolower', $strSplit);
sort($strSplitLower);
$result = ;
foreach ($dictionary as $word)
if(mb_strlen($word) <= mb_strlen($letters))
$wordSplit = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
sort($wordSplit);
if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
$result = $word;



return $result;



From this you might derive some improvements for your second function.







share|improve this answer















share|improve this answer



share|improve this answer








edited May 24 at 13:49


























answered May 24 at 12:46









cFreed

2,438719




2,438719











  • You tested your edit function code? @cFreed
    – Otabek
    May 24 at 12:57










  • This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
    – Otabek
    May 24 at 13:01










  • @Otabek Yes, I tested, using your aip sample. And it works for me.
    – cFreed
    May 24 at 13:04










  • Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
    – Otabek
    May 24 at 13:12










  • @Otabek 3v4l.org/tBnY6.
    – cFreed
    May 24 at 13:21
















  • You tested your edit function code? @cFreed
    – Otabek
    May 24 at 12:57










  • This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
    – Otabek
    May 24 at 13:01










  • @Otabek Yes, I tested, using your aip sample. And it works for me.
    – cFreed
    May 24 at 13:04










  • Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
    – Otabek
    May 24 at 13:12










  • @Otabek 3v4l.org/tBnY6.
    – cFreed
    May 24 at 13:21















You tested your edit function code? @cFreed
– Otabek
May 24 at 12:57




You tested your edit function code? @cFreed
– Otabek
May 24 at 12:57












This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
– Otabek
May 24 at 13:01




This part of code return false: if (array_map('mb_strtolower', $wordSplit) === $strSplitLower)
– Otabek
May 24 at 13:01












@Otabek Yes, I tested, using your aip sample. And it works for me.
– cFreed
May 24 at 13:04




@Otabek Yes, I tested, using your aip sample. And it works for me.
– cFreed
May 24 at 13:04












Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
– Otabek
May 24 at 13:12




Please show me real example in https://3v4l.org. For me it not work now. I use php 7.1
– Otabek
May 24 at 13:12












@Otabek 3v4l.org/tBnY6.
– cFreed
May 24 at 13:21




@Otabek 3v4l.org/tBnY6.
– cFreed
May 24 at 13:21










up vote
1
down vote













I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:



Pseudocode:



1) Preprocessing:



 words = dictionary
letters = ['a'..'z']
wordDataList =

for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)

function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num


2) Queries:



function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)

return matching





share|improve this answer





















  • This code in python? @juvian
    – Otabek
    May 29 at 2:10










  • This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
    – Otabek
    May 29 at 2:16










  • @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
    – juvian
    May 29 at 4:14














up vote
1
down vote













I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:



Pseudocode:



1) Preprocessing:



 words = dictionary
letters = ['a'..'z']
wordDataList =

for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)

function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num


2) Queries:



function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)

return matching





share|improve this answer





















  • This code in python? @juvian
    – Otabek
    May 29 at 2:10










  • This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
    – Otabek
    May 29 at 2:16










  • @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
    – juvian
    May 29 at 4:14












up vote
1
down vote










up vote
1
down vote









I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:



Pseudocode:



1) Preprocessing:



 words = dictionary
letters = ['a'..'z']
wordDataList =

for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)

function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num


2) Queries:



function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)

return matching





share|improve this answer













I will assume all your words only have a-z characters. With that, an efficient check can be made by preprocessing your dictionary:



Pseudocode:



1) Preprocessing:



 words = dictionary
letters = ['a'..'z']
wordDataList =

for each word in words:
wordData = new wordData()
wordData.word = word;
wordData.num = process(word)
wordDataList.add(wordData)

function process(word):
num = 0
for idx = 0 to letters.size():
if letters[idx] in word:
num = num + (1 << idx)
return num


2) Queries:



function query(letters, allowOtherLetters):
matching =
num = process(letters)
for wordData in wordDataList:
if (allowOtherLetters == false and wordData.num == num):
matching.add(wordData.word)
else if (allowOtherLetters and (wordData.num & num) == num):
matching.add(wordData.word)

return matching






share|improve this answer













share|improve this answer



share|improve this answer











answered May 28 at 16:40









juvian

85838




85838











  • This code in python? @juvian
    – Otabek
    May 29 at 2:10










  • This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
    – Otabek
    May 29 at 2:16










  • @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
    – juvian
    May 29 at 4:14
















  • This code in python? @juvian
    – Otabek
    May 29 at 2:10










  • This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
    – Otabek
    May 29 at 2:16










  • @Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
    – juvian
    May 29 at 4:14















This code in python? @juvian
– Otabek
May 29 at 2:10




This code in python? @juvian
– Otabek
May 29 at 2:10












This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
– Otabek
May 29 at 2:16




This code can't found word with utf-8 letters? I have not words with (a-z ) letters in my dictionary. My words in cyrillic letters. @juvian
– Otabek
May 29 at 2:16












@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
– juvian
May 29 at 4:14




@Otabek its pseudocode, not in any language. This would work for any letters, but only 32 of them. It would be easy to extend to more letters though
– juvian
May 29 at 4:14












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195074%2fretrieve-words-from-dictionary-when-they-meet-letter-requirements%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation