Finding the best terms to use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
5
down vote

favorite












I have a real-world problem involving finding the best known term from a string containing sets of terms. We have Term objects that are comparable in that they have a method a_term.is_better_than(other_term). This method is Boolean -- if it returns False, then a_term may be as good as other_term, but isn't better.



Using this method, I need to pare down a set of terms to the best ones. The best way I've been able to think about this is to compare pairs of terms and eliminate any terms when one isn't as good as the other, something like this:



def get_best_terms(terms: Set[Term]): Set[Term]
best_terms = set()
m_terms = list(terms)
while m_terms:
term = m_terms.pop()
for i in reversed(range(len(m_terms))):
if term.is_better_than(m_terms[i]):
del m_terms[i]
if m_terms[i].is_better_than(term):
break
else: # We never encountered the `break` statement
best_terms.add(term)
break
best_terms |= a_term for a_term in m_terms
if not (term.is_better_than(a_term) or
a_term.is_better_than(term))
return best_terms


This approach is functional, but pretty gross in my opinion. For one thing, there's a lot of mutation going on here. For another, once I've identified a single best term, finding all the terms that are equally good is messy. The whole thing is hard to read.



What do you suggest?



Examples



A Term is a string with some metadata attached. For example,



a = AutoTerm(term='honda', term_span=1, term_range=(1, 2),
original_term='CAR HONDA ACC...')
b = AutoTerm(term='2015 honda accord', term_span=2, term_range=(1, 3),
original_term='2015 HONDA ACC...')
c = AutoTerm(term='2015 accord', ...)


Sometimes a term is a better choice because it's more specific, but there are other nuances, such as a match in a database. Let's say, arbitrarily, that b and c are equally good, but a is not as good. Then



assert get_best_terms(a, b, c) == b, c


should pass.







share|improve this question





















  • Can you provide an example of Term and some example IO?
    – Peilonrayz
    Jul 31 at 13:57










  • @Peilonrayz done, thanks
    – kojiro
    Jul 31 at 14:25










  • Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C?
    – David Conrad
    Jul 31 at 20:01






  • 1




    @DavidConrad yes.
    – kojiro
    Jul 31 at 20:06
















up vote
5
down vote

favorite












I have a real-world problem involving finding the best known term from a string containing sets of terms. We have Term objects that are comparable in that they have a method a_term.is_better_than(other_term). This method is Boolean -- if it returns False, then a_term may be as good as other_term, but isn't better.



Using this method, I need to pare down a set of terms to the best ones. The best way I've been able to think about this is to compare pairs of terms and eliminate any terms when one isn't as good as the other, something like this:



def get_best_terms(terms: Set[Term]): Set[Term]
best_terms = set()
m_terms = list(terms)
while m_terms:
term = m_terms.pop()
for i in reversed(range(len(m_terms))):
if term.is_better_than(m_terms[i]):
del m_terms[i]
if m_terms[i].is_better_than(term):
break
else: # We never encountered the `break` statement
best_terms.add(term)
break
best_terms |= a_term for a_term in m_terms
if not (term.is_better_than(a_term) or
a_term.is_better_than(term))
return best_terms


This approach is functional, but pretty gross in my opinion. For one thing, there's a lot of mutation going on here. For another, once I've identified a single best term, finding all the terms that are equally good is messy. The whole thing is hard to read.



What do you suggest?



Examples



A Term is a string with some metadata attached. For example,



a = AutoTerm(term='honda', term_span=1, term_range=(1, 2),
original_term='CAR HONDA ACC...')
b = AutoTerm(term='2015 honda accord', term_span=2, term_range=(1, 3),
original_term='2015 HONDA ACC...')
c = AutoTerm(term='2015 accord', ...)


Sometimes a term is a better choice because it's more specific, but there are other nuances, such as a match in a database. Let's say, arbitrarily, that b and c are equally good, but a is not as good. Then



assert get_best_terms(a, b, c) == b, c


should pass.







share|improve this question





















  • Can you provide an example of Term and some example IO?
    – Peilonrayz
    Jul 31 at 13:57










  • @Peilonrayz done, thanks
    – kojiro
    Jul 31 at 14:25










  • Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C?
    – David Conrad
    Jul 31 at 20:01






  • 1




    @DavidConrad yes.
    – kojiro
    Jul 31 at 20:06












up vote
5
down vote

favorite









up vote
5
down vote

favorite











I have a real-world problem involving finding the best known term from a string containing sets of terms. We have Term objects that are comparable in that they have a method a_term.is_better_than(other_term). This method is Boolean -- if it returns False, then a_term may be as good as other_term, but isn't better.



Using this method, I need to pare down a set of terms to the best ones. The best way I've been able to think about this is to compare pairs of terms and eliminate any terms when one isn't as good as the other, something like this:



def get_best_terms(terms: Set[Term]): Set[Term]
best_terms = set()
m_terms = list(terms)
while m_terms:
term = m_terms.pop()
for i in reversed(range(len(m_terms))):
if term.is_better_than(m_terms[i]):
del m_terms[i]
if m_terms[i].is_better_than(term):
break
else: # We never encountered the `break` statement
best_terms.add(term)
break
best_terms |= a_term for a_term in m_terms
if not (term.is_better_than(a_term) or
a_term.is_better_than(term))
return best_terms


This approach is functional, but pretty gross in my opinion. For one thing, there's a lot of mutation going on here. For another, once I've identified a single best term, finding all the terms that are equally good is messy. The whole thing is hard to read.



What do you suggest?



Examples



A Term is a string with some metadata attached. For example,



a = AutoTerm(term='honda', term_span=1, term_range=(1, 2),
original_term='CAR HONDA ACC...')
b = AutoTerm(term='2015 honda accord', term_span=2, term_range=(1, 3),
original_term='2015 HONDA ACC...')
c = AutoTerm(term='2015 accord', ...)


Sometimes a term is a better choice because it's more specific, but there are other nuances, such as a match in a database. Let's say, arbitrarily, that b and c are equally good, but a is not as good. Then



assert get_best_terms(a, b, c) == b, c


should pass.







share|improve this question













I have a real-world problem involving finding the best known term from a string containing sets of terms. We have Term objects that are comparable in that they have a method a_term.is_better_than(other_term). This method is Boolean -- if it returns False, then a_term may be as good as other_term, but isn't better.



Using this method, I need to pare down a set of terms to the best ones. The best way I've been able to think about this is to compare pairs of terms and eliminate any terms when one isn't as good as the other, something like this:



def get_best_terms(terms: Set[Term]): Set[Term]
best_terms = set()
m_terms = list(terms)
while m_terms:
term = m_terms.pop()
for i in reversed(range(len(m_terms))):
if term.is_better_than(m_terms[i]):
del m_terms[i]
if m_terms[i].is_better_than(term):
break
else: # We never encountered the `break` statement
best_terms.add(term)
break
best_terms |= a_term for a_term in m_terms
if not (term.is_better_than(a_term) or
a_term.is_better_than(term))
return best_terms


This approach is functional, but pretty gross in my opinion. For one thing, there's a lot of mutation going on here. For another, once I've identified a single best term, finding all the terms that are equally good is messy. The whole thing is hard to read.



What do you suggest?



Examples



A Term is a string with some metadata attached. For example,



a = AutoTerm(term='honda', term_span=1, term_range=(1, 2),
original_term='CAR HONDA ACC...')
b = AutoTerm(term='2015 honda accord', term_span=2, term_range=(1, 3),
original_term='2015 HONDA ACC...')
c = AutoTerm(term='2015 accord', ...)


Sometimes a term is a better choice because it's more specific, but there are other nuances, such as a match in a database. Let's say, arbitrarily, that b and c are equally good, but a is not as good. Then



assert get_best_terms(a, b, c) == b, c


should pass.









share|improve this question












share|improve this question




share|improve this question








edited Jul 31 at 14:25
























asked Jul 31 at 13:17









kojiro

1,30211021




1,30211021











  • Can you provide an example of Term and some example IO?
    – Peilonrayz
    Jul 31 at 13:57










  • @Peilonrayz done, thanks
    – kojiro
    Jul 31 at 14:25










  • Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C?
    – David Conrad
    Jul 31 at 20:01






  • 1




    @DavidConrad yes.
    – kojiro
    Jul 31 at 20:06
















  • Can you provide an example of Term and some example IO?
    – Peilonrayz
    Jul 31 at 13:57










  • @Peilonrayz done, thanks
    – kojiro
    Jul 31 at 14:25










  • Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C?
    – David Conrad
    Jul 31 at 20:01






  • 1




    @DavidConrad yes.
    – kojiro
    Jul 31 at 20:06















Can you provide an example of Term and some example IO?
– Peilonrayz
Jul 31 at 13:57




Can you provide an example of Term and some example IO?
– Peilonrayz
Jul 31 at 13:57












@Peilonrayz done, thanks
– kojiro
Jul 31 at 14:25




@Peilonrayz done, thanks
– kojiro
Jul 31 at 14:25












Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C?
– David Conrad
Jul 31 at 20:01




Is the ranking of Terms transitive? If Term A is better than Term B, and Term B is better than Term C, is Term A guaranteed to be better than Term C?
– David Conrad
Jul 31 at 20:01




1




1




@DavidConrad yes.
– kojiro
Jul 31 at 20:06




@DavidConrad yes.
– kojiro
Jul 31 at 20:06










2 Answers
2






active

oldest

votes

















up vote
5
down vote



accepted










  1. You have a bug, if you remove an item in your for loop without breaking you get an IndexError.
    Instead use a worse set so that you know what to skip.

  2. What you seem to be doing is performing a sort, and so if you make a cmp function you can use sorted.
    To do this you need to use functools.cmp_to_key and then you can itertools.takewhile.

And so I'd use:



import functools
import itertools
from typing import Set, List


class Term(int):
def is_better_than(self, other):
return self // 2 > other // 2


def cmp(self: Term, other: Term):
if self.is_better_than(other):
return 1
if other.is_better_than(self):
return -1
return 0


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, key=functools.cmp_to_key(cmp), reverse=True)
best = terms[0]
return list(itertools.takewhile(lambda i: not best.is_better_than(i), terms))


print(get_best_terms(map(Term, [1, 2, 3, 4, 5])))
print(get_best_terms(map(Term, [1, 2, 3, 4])))
print(get_best_terms(map(Term, [5, 1, 2, 4, 3])))
print(get_best_terms(map(Term, [5, 1, 4, 3])))


This can however be simplified if Terms are comparable, as you won't need to use key and can remove the need for a lambda.



@functools.total_ordering
class Term(int):
def __gt__(self, other: Term) -> bool:
return self // 2 > other // 2

def __eq__(self, other: Term) -> bool:
return self // 2 == other // 2


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, reverse=True)
return list(itertools.takewhile(terms[0].__eq__, terms))





share|improve this answer























  • Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
    – kojiro
    Jul 31 at 15:19










  • "in your for without breaking" for loop?
    – Acccumulation
    Jul 31 at 15:26










  • @Acccumulation Yes, I'll try make that clearer
    – Peilonrayz
    Jul 31 at 15:27










  • @kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
    – Peilonrayz
    Jul 31 at 15:34










  • It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
    – kojiro
    Jul 31 at 15:36

















up vote
3
down vote













If all you want are the best terms, then it seems to me that you can just keep a list of the current best terms. Every time you find a term that at least as good as any one of the previous best terms, then, assuming transitivity, it must be at least as good as all the previous best terms. And since the previous best terms were at least as good as all the terms tested so far, the new term is at least as good as all the terms tested so for. Next you check whether the new term is strictly better than one of the current best terms. If so, then you can replace the list of best terms with a list containing just the new term. If not, then the current term is just as good as the current best terms, so you should append it to the list of best terms.



best_terms = 
for term in terms:
if not best_terms:
best_terms = [term]
continue
if not best_terms[0].is_better_than(term):
if term.is_better_than(best_terms[0]):
best_terms = [term]
else
best_terms.append(term)





share|improve this answer



















  • 1




    Please tell me you changed your account name just to give this answer, because that'd be about perfect.
    – kojiro
    Jul 31 at 15:47










  • Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
    – coconochao
    Jul 31 at 20:14










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200663%2ffinding-the-best-terms-to-use%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote



accepted










  1. You have a bug, if you remove an item in your for loop without breaking you get an IndexError.
    Instead use a worse set so that you know what to skip.

  2. What you seem to be doing is performing a sort, and so if you make a cmp function you can use sorted.
    To do this you need to use functools.cmp_to_key and then you can itertools.takewhile.

And so I'd use:



import functools
import itertools
from typing import Set, List


class Term(int):
def is_better_than(self, other):
return self // 2 > other // 2


def cmp(self: Term, other: Term):
if self.is_better_than(other):
return 1
if other.is_better_than(self):
return -1
return 0


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, key=functools.cmp_to_key(cmp), reverse=True)
best = terms[0]
return list(itertools.takewhile(lambda i: not best.is_better_than(i), terms))


print(get_best_terms(map(Term, [1, 2, 3, 4, 5])))
print(get_best_terms(map(Term, [1, 2, 3, 4])))
print(get_best_terms(map(Term, [5, 1, 2, 4, 3])))
print(get_best_terms(map(Term, [5, 1, 4, 3])))


This can however be simplified if Terms are comparable, as you won't need to use key and can remove the need for a lambda.



@functools.total_ordering
class Term(int):
def __gt__(self, other: Term) -> bool:
return self // 2 > other // 2

def __eq__(self, other: Term) -> bool:
return self // 2 == other // 2


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, reverse=True)
return list(itertools.takewhile(terms[0].__eq__, terms))





share|improve this answer























  • Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
    – kojiro
    Jul 31 at 15:19










  • "in your for without breaking" for loop?
    – Acccumulation
    Jul 31 at 15:26










  • @Acccumulation Yes, I'll try make that clearer
    – Peilonrayz
    Jul 31 at 15:27










  • @kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
    – Peilonrayz
    Jul 31 at 15:34










  • It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
    – kojiro
    Jul 31 at 15:36














up vote
5
down vote



accepted










  1. You have a bug, if you remove an item in your for loop without breaking you get an IndexError.
    Instead use a worse set so that you know what to skip.

  2. What you seem to be doing is performing a sort, and so if you make a cmp function you can use sorted.
    To do this you need to use functools.cmp_to_key and then you can itertools.takewhile.

And so I'd use:



import functools
import itertools
from typing import Set, List


class Term(int):
def is_better_than(self, other):
return self // 2 > other // 2


def cmp(self: Term, other: Term):
if self.is_better_than(other):
return 1
if other.is_better_than(self):
return -1
return 0


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, key=functools.cmp_to_key(cmp), reverse=True)
best = terms[0]
return list(itertools.takewhile(lambda i: not best.is_better_than(i), terms))


print(get_best_terms(map(Term, [1, 2, 3, 4, 5])))
print(get_best_terms(map(Term, [1, 2, 3, 4])))
print(get_best_terms(map(Term, [5, 1, 2, 4, 3])))
print(get_best_terms(map(Term, [5, 1, 4, 3])))


This can however be simplified if Terms are comparable, as you won't need to use key and can remove the need for a lambda.



@functools.total_ordering
class Term(int):
def __gt__(self, other: Term) -> bool:
return self // 2 > other // 2

def __eq__(self, other: Term) -> bool:
return self // 2 == other // 2


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, reverse=True)
return list(itertools.takewhile(terms[0].__eq__, terms))





share|improve this answer























  • Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
    – kojiro
    Jul 31 at 15:19










  • "in your for without breaking" for loop?
    – Acccumulation
    Jul 31 at 15:26










  • @Acccumulation Yes, I'll try make that clearer
    – Peilonrayz
    Jul 31 at 15:27










  • @kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
    – Peilonrayz
    Jul 31 at 15:34










  • It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
    – kojiro
    Jul 31 at 15:36












up vote
5
down vote



accepted







up vote
5
down vote



accepted






  1. You have a bug, if you remove an item in your for loop without breaking you get an IndexError.
    Instead use a worse set so that you know what to skip.

  2. What you seem to be doing is performing a sort, and so if you make a cmp function you can use sorted.
    To do this you need to use functools.cmp_to_key and then you can itertools.takewhile.

And so I'd use:



import functools
import itertools
from typing import Set, List


class Term(int):
def is_better_than(self, other):
return self // 2 > other // 2


def cmp(self: Term, other: Term):
if self.is_better_than(other):
return 1
if other.is_better_than(self):
return -1
return 0


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, key=functools.cmp_to_key(cmp), reverse=True)
best = terms[0]
return list(itertools.takewhile(lambda i: not best.is_better_than(i), terms))


print(get_best_terms(map(Term, [1, 2, 3, 4, 5])))
print(get_best_terms(map(Term, [1, 2, 3, 4])))
print(get_best_terms(map(Term, [5, 1, 2, 4, 3])))
print(get_best_terms(map(Term, [5, 1, 4, 3])))


This can however be simplified if Terms are comparable, as you won't need to use key and can remove the need for a lambda.



@functools.total_ordering
class Term(int):
def __gt__(self, other: Term) -> bool:
return self // 2 > other // 2

def __eq__(self, other: Term) -> bool:
return self // 2 == other // 2


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, reverse=True)
return list(itertools.takewhile(terms[0].__eq__, terms))





share|improve this answer















  1. You have a bug, if you remove an item in your for loop without breaking you get an IndexError.
    Instead use a worse set so that you know what to skip.

  2. What you seem to be doing is performing a sort, and so if you make a cmp function you can use sorted.
    To do this you need to use functools.cmp_to_key and then you can itertools.takewhile.

And so I'd use:



import functools
import itertools
from typing import Set, List


class Term(int):
def is_better_than(self, other):
return self // 2 > other // 2


def cmp(self: Term, other: Term):
if self.is_better_than(other):
return 1
if other.is_better_than(self):
return -1
return 0


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, key=functools.cmp_to_key(cmp), reverse=True)
best = terms[0]
return list(itertools.takewhile(lambda i: not best.is_better_than(i), terms))


print(get_best_terms(map(Term, [1, 2, 3, 4, 5])))
print(get_best_terms(map(Term, [1, 2, 3, 4])))
print(get_best_terms(map(Term, [5, 1, 2, 4, 3])))
print(get_best_terms(map(Term, [5, 1, 4, 3])))


This can however be simplified if Terms are comparable, as you won't need to use key and can remove the need for a lambda.



@functools.total_ordering
class Term(int):
def __gt__(self, other: Term) -> bool:
return self // 2 > other // 2

def __eq__(self, other: Term) -> bool:
return self // 2 == other // 2


def get_best_terms(terms: Set[Term]) -> List[Term]:
if not terms:
return
terms = sorted(terms, reverse=True)
return list(itertools.takewhile(terms[0].__eq__, terms))






share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 31 at 15:27


























answered Jul 31 at 14:41









Peilonrayz

24.3k336101




24.3k336101











  • Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
    – kojiro
    Jul 31 at 15:19










  • "in your for without breaking" for loop?
    – Acccumulation
    Jul 31 at 15:26










  • @Acccumulation Yes, I'll try make that clearer
    – Peilonrayz
    Jul 31 at 15:27










  • @kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
    – Peilonrayz
    Jul 31 at 15:34










  • It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
    – kojiro
    Jul 31 at 15:36
















  • Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
    – kojiro
    Jul 31 at 15:19










  • "in your for without breaking" for loop?
    – Acccumulation
    Jul 31 at 15:26










  • @Acccumulation Yes, I'll try make that clearer
    – Peilonrayz
    Jul 31 at 15:27










  • @kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
    – Peilonrayz
    Jul 31 at 15:34










  • It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
    – kojiro
    Jul 31 at 15:36















Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
– kojiro
Jul 31 at 15:19




Yeah, this is great, thanks. I had thought about using total_ordering and sorting, but I was thinking I didn't actually need to sort. But in looking again, I think the gain in readability is worth every penny.
– kojiro
Jul 31 at 15:19












"in your for without breaking" for loop?
– Acccumulation
Jul 31 at 15:26




"in your for without breaking" for loop?
– Acccumulation
Jul 31 at 15:26












@Acccumulation Yes, I'll try make that clearer
– Peilonrayz
Jul 31 at 15:27




@Acccumulation Yes, I'll try make that clearer
– Peilonrayz
Jul 31 at 15:27












@kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
– Peilonrayz
Jul 31 at 15:34




@kojiro You didn't say in your question but is Term.is_better_than costly? Because you could make the code run in $O(n)$ time, with a worst case $2n$ calls to Term.is_best_terms if they are all equal.
– Peilonrayz
Jul 31 at 15:34












It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
– kojiro
Jul 31 at 15:36




It's not costly afaict. It is impossible to understand, though, which is why I am still having trouble writing a satisfactory __eq__ for total ordering. I'm also considering using max and a filter instead of takewhile, which trades the sorted for max 2x passes over the set.
– kojiro
Jul 31 at 15:36












up vote
3
down vote













If all you want are the best terms, then it seems to me that you can just keep a list of the current best terms. Every time you find a term that at least as good as any one of the previous best terms, then, assuming transitivity, it must be at least as good as all the previous best terms. And since the previous best terms were at least as good as all the terms tested so far, the new term is at least as good as all the terms tested so for. Next you check whether the new term is strictly better than one of the current best terms. If so, then you can replace the list of best terms with a list containing just the new term. If not, then the current term is just as good as the current best terms, so you should append it to the list of best terms.



best_terms = 
for term in terms:
if not best_terms:
best_terms = [term]
continue
if not best_terms[0].is_better_than(term):
if term.is_better_than(best_terms[0]):
best_terms = [term]
else
best_terms.append(term)





share|improve this answer



















  • 1




    Please tell me you changed your account name just to give this answer, because that'd be about perfect.
    – kojiro
    Jul 31 at 15:47










  • Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
    – coconochao
    Jul 31 at 20:14














up vote
3
down vote













If all you want are the best terms, then it seems to me that you can just keep a list of the current best terms. Every time you find a term that at least as good as any one of the previous best terms, then, assuming transitivity, it must be at least as good as all the previous best terms. And since the previous best terms were at least as good as all the terms tested so far, the new term is at least as good as all the terms tested so for. Next you check whether the new term is strictly better than one of the current best terms. If so, then you can replace the list of best terms with a list containing just the new term. If not, then the current term is just as good as the current best terms, so you should append it to the list of best terms.



best_terms = 
for term in terms:
if not best_terms:
best_terms = [term]
continue
if not best_terms[0].is_better_than(term):
if term.is_better_than(best_terms[0]):
best_terms = [term]
else
best_terms.append(term)





share|improve this answer



















  • 1




    Please tell me you changed your account name just to give this answer, because that'd be about perfect.
    – kojiro
    Jul 31 at 15:47










  • Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
    – coconochao
    Jul 31 at 20:14












up vote
3
down vote










up vote
3
down vote









If all you want are the best terms, then it seems to me that you can just keep a list of the current best terms. Every time you find a term that at least as good as any one of the previous best terms, then, assuming transitivity, it must be at least as good as all the previous best terms. And since the previous best terms were at least as good as all the terms tested so far, the new term is at least as good as all the terms tested so for. Next you check whether the new term is strictly better than one of the current best terms. If so, then you can replace the list of best terms with a list containing just the new term. If not, then the current term is just as good as the current best terms, so you should append it to the list of best terms.



best_terms = 
for term in terms:
if not best_terms:
best_terms = [term]
continue
if not best_terms[0].is_better_than(term):
if term.is_better_than(best_terms[0]):
best_terms = [term]
else
best_terms.append(term)





share|improve this answer















If all you want are the best terms, then it seems to me that you can just keep a list of the current best terms. Every time you find a term that at least as good as any one of the previous best terms, then, assuming transitivity, it must be at least as good as all the previous best terms. And since the previous best terms were at least as good as all the terms tested so far, the new term is at least as good as all the terms tested so for. Next you check whether the new term is strictly better than one of the current best terms. If so, then you can replace the list of best terms with a list containing just the new term. If not, then the current term is just as good as the current best terms, so you should append it to the list of best terms.



best_terms = 
for term in terms:
if not best_terms:
best_terms = [term]
continue
if not best_terms[0].is_better_than(term):
if term.is_better_than(best_terms[0]):
best_terms = [term]
else
best_terms.append(term)






share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 31 at 21:50


























answered Jul 31 at 15:46









Acccumulation

9395




9395







  • 1




    Please tell me you changed your account name just to give this answer, because that'd be about perfect.
    – kojiro
    Jul 31 at 15:47










  • Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
    – coconochao
    Jul 31 at 20:14












  • 1




    Please tell me you changed your account name just to give this answer, because that'd be about perfect.
    – kojiro
    Jul 31 at 15:47










  • Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
    – coconochao
    Jul 31 at 20:14







1




1




Please tell me you changed your account name just to give this answer, because that'd be about perfect.
– kojiro
Jul 31 at 15:47




Please tell me you changed your account name just to give this answer, because that'd be about perfect.
– kojiro
Jul 31 at 15:47












Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
– coconochao
Jul 31 at 20:14




Except "is_better_than" returns false if the terms are equally good. You seem to be considering otherwise...
– coconochao
Jul 31 at 20:14












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200663%2ffinding-the-best-terms-to-use%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation