Python solution to Code Jam's 'Rounding Error'

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
7
down vote

favorite
1












The "Rounding Error" problem of Round 1B of Google Code Jam 2018 is as follows:




Problem



To finally settle the age-old question of which programming language is the best, you are asking a total of N people to tell you their favorite language. This is an open-ended question: each person is free to name any language, and there are infinitely many languages in the world.



Some people have already responded, and you have gathered this information as a list of counts. For example, 1 2 means that you have asked 3 people so far, and one picked a particular language, and the other two picked some other language.



You plan to publish the results in a table listing each language and the percentage of people who picked it. You will round each percentage to the nearest integer, rounding up any percentage with a decimal part equal to or greater than 0.5. So, for example, 12.5% would round up to 13%, 99.5% would round up to 100%, and 12.4999% would round down to 12%.



In surveys like this, sometimes the rounded percentages do not add up to exactly 100. After you are done surveying the remaining people, what is the largest value that the rounded percentages could possibly add up to?



Input



The first line of the input gives the number of test cases, T. T test cases follow; each consists of two lines. The first line consists of two integers N and L: the total number of people in the survey, and the total number of different languages represented among the people who have already responded. The second line consists of L integers Ci; the ith of these is the number of people who said that the ith of the represented languages was their favorite.



Output



For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is the largest value that the percentages could possibly add up to, as described above.



Limits



1 ≤ T ≤ 100.

1 ≤ L < N.

1 ≤ Ci, for all i.

The sum of all Ci values < N.

Time limit: 10 seconds per test set.

Memory limit: 1GB.




This is my Python solution. However, I get the 'Time Limit Exceeded' result even for the smallest test case. How can I speed up this solution?



from functools import reduce


def main():
def gen_sums(total, freq, remaining):
"""Generate percentages' sums"""
if not remaining:
yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
else:
seen = set()

for i in range(len(freq)):
if not freq[i] in seen:
yield from gen_sums(total,
freq[:i] + [freq[i]+1] + freq[i+1:],
remaining-1)
seen.add(freq[i])

yield from gen_sums(total, freq+[1], remaining-1)

T = int(input())

for i in range(1, T+1):
total_people, num_languages = map(int, input().split())
languages_frequency = [int(x) for x in input().split()]

if not 100 % total_people:
print('Case #: '.format(i, 100))
continue

not_responded = total_people - sum(languages_frequency)

max_percentage = max(gen_sums(total_people, languages_frequency,
not_responded))

print('Case #: '.format(i, max_percentage))

main()






share|improve this question



























    up vote
    7
    down vote

    favorite
    1












    The "Rounding Error" problem of Round 1B of Google Code Jam 2018 is as follows:




    Problem



    To finally settle the age-old question of which programming language is the best, you are asking a total of N people to tell you their favorite language. This is an open-ended question: each person is free to name any language, and there are infinitely many languages in the world.



    Some people have already responded, and you have gathered this information as a list of counts. For example, 1 2 means that you have asked 3 people so far, and one picked a particular language, and the other two picked some other language.



    You plan to publish the results in a table listing each language and the percentage of people who picked it. You will round each percentage to the nearest integer, rounding up any percentage with a decimal part equal to or greater than 0.5. So, for example, 12.5% would round up to 13%, 99.5% would round up to 100%, and 12.4999% would round down to 12%.



    In surveys like this, sometimes the rounded percentages do not add up to exactly 100. After you are done surveying the remaining people, what is the largest value that the rounded percentages could possibly add up to?



    Input



    The first line of the input gives the number of test cases, T. T test cases follow; each consists of two lines. The first line consists of two integers N and L: the total number of people in the survey, and the total number of different languages represented among the people who have already responded. The second line consists of L integers Ci; the ith of these is the number of people who said that the ith of the represented languages was their favorite.



    Output



    For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is the largest value that the percentages could possibly add up to, as described above.



    Limits



    1 ≤ T ≤ 100.

    1 ≤ L < N.

    1 ≤ Ci, for all i.

    The sum of all Ci values < N.

    Time limit: 10 seconds per test set.

    Memory limit: 1GB.




    This is my Python solution. However, I get the 'Time Limit Exceeded' result even for the smallest test case. How can I speed up this solution?



    from functools import reduce


    def main():
    def gen_sums(total, freq, remaining):
    """Generate percentages' sums"""
    if not remaining:
    yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
    else:
    seen = set()

    for i in range(len(freq)):
    if not freq[i] in seen:
    yield from gen_sums(total,
    freq[:i] + [freq[i]+1] + freq[i+1:],
    remaining-1)
    seen.add(freq[i])

    yield from gen_sums(total, freq+[1], remaining-1)

    T = int(input())

    for i in range(1, T+1):
    total_people, num_languages = map(int, input().split())
    languages_frequency = [int(x) for x in input().split()]

    if not 100 % total_people:
    print('Case #: '.format(i, 100))
    continue

    not_responded = total_people - sum(languages_frequency)

    max_percentage = max(gen_sums(total_people, languages_frequency,
    not_responded))

    print('Case #: '.format(i, max_percentage))

    main()






    share|improve this question























      up vote
      7
      down vote

      favorite
      1









      up vote
      7
      down vote

      favorite
      1






      1





      The "Rounding Error" problem of Round 1B of Google Code Jam 2018 is as follows:




      Problem



      To finally settle the age-old question of which programming language is the best, you are asking a total of N people to tell you their favorite language. This is an open-ended question: each person is free to name any language, and there are infinitely many languages in the world.



      Some people have already responded, and you have gathered this information as a list of counts. For example, 1 2 means that you have asked 3 people so far, and one picked a particular language, and the other two picked some other language.



      You plan to publish the results in a table listing each language and the percentage of people who picked it. You will round each percentage to the nearest integer, rounding up any percentage with a decimal part equal to or greater than 0.5. So, for example, 12.5% would round up to 13%, 99.5% would round up to 100%, and 12.4999% would round down to 12%.



      In surveys like this, sometimes the rounded percentages do not add up to exactly 100. After you are done surveying the remaining people, what is the largest value that the rounded percentages could possibly add up to?



      Input



      The first line of the input gives the number of test cases, T. T test cases follow; each consists of two lines. The first line consists of two integers N and L: the total number of people in the survey, and the total number of different languages represented among the people who have already responded. The second line consists of L integers Ci; the ith of these is the number of people who said that the ith of the represented languages was their favorite.



      Output



      For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is the largest value that the percentages could possibly add up to, as described above.



      Limits



      1 ≤ T ≤ 100.

      1 ≤ L < N.

      1 ≤ Ci, for all i.

      The sum of all Ci values < N.

      Time limit: 10 seconds per test set.

      Memory limit: 1GB.




      This is my Python solution. However, I get the 'Time Limit Exceeded' result even for the smallest test case. How can I speed up this solution?



      from functools import reduce


      def main():
      def gen_sums(total, freq, remaining):
      """Generate percentages' sums"""
      if not remaining:
      yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
      else:
      seen = set()

      for i in range(len(freq)):
      if not freq[i] in seen:
      yield from gen_sums(total,
      freq[:i] + [freq[i]+1] + freq[i+1:],
      remaining-1)
      seen.add(freq[i])

      yield from gen_sums(total, freq+[1], remaining-1)

      T = int(input())

      for i in range(1, T+1):
      total_people, num_languages = map(int, input().split())
      languages_frequency = [int(x) for x in input().split()]

      if not 100 % total_people:
      print('Case #: '.format(i, 100))
      continue

      not_responded = total_people - sum(languages_frequency)

      max_percentage = max(gen_sums(total_people, languages_frequency,
      not_responded))

      print('Case #: '.format(i, max_percentage))

      main()






      share|improve this question













      The "Rounding Error" problem of Round 1B of Google Code Jam 2018 is as follows:




      Problem



      To finally settle the age-old question of which programming language is the best, you are asking a total of N people to tell you their favorite language. This is an open-ended question: each person is free to name any language, and there are infinitely many languages in the world.



      Some people have already responded, and you have gathered this information as a list of counts. For example, 1 2 means that you have asked 3 people so far, and one picked a particular language, and the other two picked some other language.



      You plan to publish the results in a table listing each language and the percentage of people who picked it. You will round each percentage to the nearest integer, rounding up any percentage with a decimal part equal to or greater than 0.5. So, for example, 12.5% would round up to 13%, 99.5% would round up to 100%, and 12.4999% would round down to 12%.



      In surveys like this, sometimes the rounded percentages do not add up to exactly 100. After you are done surveying the remaining people, what is the largest value that the rounded percentages could possibly add up to?



      Input



      The first line of the input gives the number of test cases, T. T test cases follow; each consists of two lines. The first line consists of two integers N and L: the total number of people in the survey, and the total number of different languages represented among the people who have already responded. The second line consists of L integers Ci; the ith of these is the number of people who said that the ith of the represented languages was their favorite.



      Output



      For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is the largest value that the percentages could possibly add up to, as described above.



      Limits



      1 ≤ T ≤ 100.

      1 ≤ L < N.

      1 ≤ Ci, for all i.

      The sum of all Ci values < N.

      Time limit: 10 seconds per test set.

      Memory limit: 1GB.




      This is my Python solution. However, I get the 'Time Limit Exceeded' result even for the smallest test case. How can I speed up this solution?



      from functools import reduce


      def main():
      def gen_sums(total, freq, remaining):
      """Generate percentages' sums"""
      if not remaining:
      yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
      else:
      seen = set()

      for i in range(len(freq)):
      if not freq[i] in seen:
      yield from gen_sums(total,
      freq[:i] + [freq[i]+1] + freq[i+1:],
      remaining-1)
      seen.add(freq[i])

      yield from gen_sums(total, freq+[1], remaining-1)

      T = int(input())

      for i in range(1, T+1):
      total_people, num_languages = map(int, input().split())
      languages_frequency = [int(x) for x in input().split()]

      if not 100 % total_people:
      print('Case #: '.format(i, 100))
      continue

      not_responded = total_people - sum(languages_frequency)

      max_percentage = max(gen_sums(total_people, languages_frequency,
      not_responded))

      print('Case #: '.format(i, max_percentage))

      main()








      share|improve this question












      share|improve this question




      share|improve this question








      edited May 8 at 6:16









      Gareth Rees

      41.1k394166




      41.1k394166









      asked Apr 29 at 19:59









      Eugene Yarmash

      19618




      19618




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          1. Review



          It is hard to test this code because it gets its data from standard input.



          This means that in order to test it or measure its performance you have to write the test case to a file and then run the code with standard input redirected from that file. It would be easier to test if the code were refactored so that the input-reading part of the code was in a separate function from the problem-solving part, like this:



          from functools import reduce

          def gen_sums(total, freq, remaining):
          """Generate percentages' sums"""
          if not remaining:
          yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
          else:
          seen = set()
          for i in range(len(freq)):
          if not freq[i] in seen:
          yield from gen_sums(total,
          freq[:i] + [freq[i]+1] + freq[i+1:],
          remaining-1)
          seen.add(freq[i])
          yield from gen_sums(total, freq+[1], remaining-1)

          def max_percentage(total_people, languages_frequency):
          if not 100 % total_people:
          return 100
          not_responded = total_people - sum(languages_frequency)
          return max(gen_sums(total_people, languages_frequency, not_responded))

          def main():
          T = int(input())
          for i in range(1, T+1):
          total_people, num_languages = map(int, input().split())
          languages_frequency = [int(x) for x in input().split()]
          result = max_percentage(total_people, languages_frequency)
          print('Case #: '.format(i, result))


          Now it's easy to test the code from the interactive interpreter or from a unit test, for example:



          >>> max_percentage(11, [1, 2, 3, 4])
          99


          and easy to measure its performance:



          >>> from timeit import timeit
          >>> timeit(lambda:max_percentage(14, ), number=1)
          8.237278351996792


          2. Performance



          As demonstrated above, the code in the post takes more than 8 seconds to figure out that when there are 14 voters (and no votes cast yet), the maximum rounded percentage is 101%.



          Why does the code in the post take so long to solve this problem? Well, it's because it carries out a search over all possible assignments of votes. But if there are $n$ voters then there are of the order of $$exppisqrt 2n over 3$$ possible assignments of votes (see the asymptotic formula for the partition numbers) and so the runtime is exponential in the number of votes.



          But in fact the problem is easy to solve by hand with a little bit of mathematics. If there are 14 voters, then each vote is worth $100 over 14 = 71over7$ percent, and so four votes are worth $284over7$ percent, which rounds up to 29. Fewer than four votes will round down, and more than four will be wasteful since the extra votes over four could be used to contribute to another block of four votes. So the maximum rounded percentage is found when we group as many of the votes as possible into blocks of four votes. In this case there are three such blocks, leaving two votes over, giving a rounded total of $29·3 + 14 = 101$.



          Similar analysis shows that if there are 19 voters, then each voter contributes $100over19 = 55over19$ percent, and so it is most efficient to put voters in pairs, because two votes contibute $1010over19$ percent, which rounds up to 11. So we have nine pairs of votes and one left over, giving a rounded total of $11·9 + 5 = 104$. But it will be a very long time before max_percentage(19, ) returns the result.



          So searching over all possible assignments of votes is not going to work even for moderately large problem sizes. Instead, you need to program the kind of mathematical analysis that I carried out in the examples above. I'm not going to spoil the problem for you by giving away my solution, but I'll just show one performance measurement:



          >>> timeit(lambda:max_percentage2(14, ), number=1)
          0.00011452200124040246


          This is more than 70,000 times faster than the code in the post.






          share|improve this answer




























            up vote
            -3
            down vote













            Obviously, if you don't save the path records, you've been, it's extremely time consuming. (Use DP)






            share|improve this answer

















            • 4




              This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
              – Vogel612♦
              Apr 30 at 7:46










            • How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
              – Eugene Yarmash
              Apr 30 at 9:07










            • @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
              – FreezePhoenix
              Apr 30 at 13:22











            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193225%2fpython-solution-to-code-jams-rounding-error%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            3
            down vote



            accepted










            1. Review



            It is hard to test this code because it gets its data from standard input.



            This means that in order to test it or measure its performance you have to write the test case to a file and then run the code with standard input redirected from that file. It would be easier to test if the code were refactored so that the input-reading part of the code was in a separate function from the problem-solving part, like this:



            from functools import reduce

            def gen_sums(total, freq, remaining):
            """Generate percentages' sums"""
            if not remaining:
            yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
            else:
            seen = set()
            for i in range(len(freq)):
            if not freq[i] in seen:
            yield from gen_sums(total,
            freq[:i] + [freq[i]+1] + freq[i+1:],
            remaining-1)
            seen.add(freq[i])
            yield from gen_sums(total, freq+[1], remaining-1)

            def max_percentage(total_people, languages_frequency):
            if not 100 % total_people:
            return 100
            not_responded = total_people - sum(languages_frequency)
            return max(gen_sums(total_people, languages_frequency, not_responded))

            def main():
            T = int(input())
            for i in range(1, T+1):
            total_people, num_languages = map(int, input().split())
            languages_frequency = [int(x) for x in input().split()]
            result = max_percentage(total_people, languages_frequency)
            print('Case #: '.format(i, result))


            Now it's easy to test the code from the interactive interpreter or from a unit test, for example:



            >>> max_percentage(11, [1, 2, 3, 4])
            99


            and easy to measure its performance:



            >>> from timeit import timeit
            >>> timeit(lambda:max_percentage(14, ), number=1)
            8.237278351996792


            2. Performance



            As demonstrated above, the code in the post takes more than 8 seconds to figure out that when there are 14 voters (and no votes cast yet), the maximum rounded percentage is 101%.



            Why does the code in the post take so long to solve this problem? Well, it's because it carries out a search over all possible assignments of votes. But if there are $n$ voters then there are of the order of $$exppisqrt 2n over 3$$ possible assignments of votes (see the asymptotic formula for the partition numbers) and so the runtime is exponential in the number of votes.



            But in fact the problem is easy to solve by hand with a little bit of mathematics. If there are 14 voters, then each vote is worth $100 over 14 = 71over7$ percent, and so four votes are worth $284over7$ percent, which rounds up to 29. Fewer than four votes will round down, and more than four will be wasteful since the extra votes over four could be used to contribute to another block of four votes. So the maximum rounded percentage is found when we group as many of the votes as possible into blocks of four votes. In this case there are three such blocks, leaving two votes over, giving a rounded total of $29·3 + 14 = 101$.



            Similar analysis shows that if there are 19 voters, then each voter contributes $100over19 = 55over19$ percent, and so it is most efficient to put voters in pairs, because two votes contibute $1010over19$ percent, which rounds up to 11. So we have nine pairs of votes and one left over, giving a rounded total of $11·9 + 5 = 104$. But it will be a very long time before max_percentage(19, ) returns the result.



            So searching over all possible assignments of votes is not going to work even for moderately large problem sizes. Instead, you need to program the kind of mathematical analysis that I carried out in the examples above. I'm not going to spoil the problem for you by giving away my solution, but I'll just show one performance measurement:



            >>> timeit(lambda:max_percentage2(14, ), number=1)
            0.00011452200124040246


            This is more than 70,000 times faster than the code in the post.






            share|improve this answer

























              up vote
              3
              down vote



              accepted










              1. Review



              It is hard to test this code because it gets its data from standard input.



              This means that in order to test it or measure its performance you have to write the test case to a file and then run the code with standard input redirected from that file. It would be easier to test if the code were refactored so that the input-reading part of the code was in a separate function from the problem-solving part, like this:



              from functools import reduce

              def gen_sums(total, freq, remaining):
              """Generate percentages' sums"""
              if not remaining:
              yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
              else:
              seen = set()
              for i in range(len(freq)):
              if not freq[i] in seen:
              yield from gen_sums(total,
              freq[:i] + [freq[i]+1] + freq[i+1:],
              remaining-1)
              seen.add(freq[i])
              yield from gen_sums(total, freq+[1], remaining-1)

              def max_percentage(total_people, languages_frequency):
              if not 100 % total_people:
              return 100
              not_responded = total_people - sum(languages_frequency)
              return max(gen_sums(total_people, languages_frequency, not_responded))

              def main():
              T = int(input())
              for i in range(1, T+1):
              total_people, num_languages = map(int, input().split())
              languages_frequency = [int(x) for x in input().split()]
              result = max_percentage(total_people, languages_frequency)
              print('Case #: '.format(i, result))


              Now it's easy to test the code from the interactive interpreter or from a unit test, for example:



              >>> max_percentage(11, [1, 2, 3, 4])
              99


              and easy to measure its performance:



              >>> from timeit import timeit
              >>> timeit(lambda:max_percentage(14, ), number=1)
              8.237278351996792


              2. Performance



              As demonstrated above, the code in the post takes more than 8 seconds to figure out that when there are 14 voters (and no votes cast yet), the maximum rounded percentage is 101%.



              Why does the code in the post take so long to solve this problem? Well, it's because it carries out a search over all possible assignments of votes. But if there are $n$ voters then there are of the order of $$exppisqrt 2n over 3$$ possible assignments of votes (see the asymptotic formula for the partition numbers) and so the runtime is exponential in the number of votes.



              But in fact the problem is easy to solve by hand with a little bit of mathematics. If there are 14 voters, then each vote is worth $100 over 14 = 71over7$ percent, and so four votes are worth $284over7$ percent, which rounds up to 29. Fewer than four votes will round down, and more than four will be wasteful since the extra votes over four could be used to contribute to another block of four votes. So the maximum rounded percentage is found when we group as many of the votes as possible into blocks of four votes. In this case there are three such blocks, leaving two votes over, giving a rounded total of $29·3 + 14 = 101$.



              Similar analysis shows that if there are 19 voters, then each voter contributes $100over19 = 55over19$ percent, and so it is most efficient to put voters in pairs, because two votes contibute $1010over19$ percent, which rounds up to 11. So we have nine pairs of votes and one left over, giving a rounded total of $11·9 + 5 = 104$. But it will be a very long time before max_percentage(19, ) returns the result.



              So searching over all possible assignments of votes is not going to work even for moderately large problem sizes. Instead, you need to program the kind of mathematical analysis that I carried out in the examples above. I'm not going to spoil the problem for you by giving away my solution, but I'll just show one performance measurement:



              >>> timeit(lambda:max_percentage2(14, ), number=1)
              0.00011452200124040246


              This is more than 70,000 times faster than the code in the post.






              share|improve this answer























                up vote
                3
                down vote



                accepted







                up vote
                3
                down vote



                accepted






                1. Review



                It is hard to test this code because it gets its data from standard input.



                This means that in order to test it or measure its performance you have to write the test case to a file and then run the code with standard input redirected from that file. It would be easier to test if the code were refactored so that the input-reading part of the code was in a separate function from the problem-solving part, like this:



                from functools import reduce

                def gen_sums(total, freq, remaining):
                """Generate percentages' sums"""
                if not remaining:
                yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
                else:
                seen = set()
                for i in range(len(freq)):
                if not freq[i] in seen:
                yield from gen_sums(total,
                freq[:i] + [freq[i]+1] + freq[i+1:],
                remaining-1)
                seen.add(freq[i])
                yield from gen_sums(total, freq+[1], remaining-1)

                def max_percentage(total_people, languages_frequency):
                if not 100 % total_people:
                return 100
                not_responded = total_people - sum(languages_frequency)
                return max(gen_sums(total_people, languages_frequency, not_responded))

                def main():
                T = int(input())
                for i in range(1, T+1):
                total_people, num_languages = map(int, input().split())
                languages_frequency = [int(x) for x in input().split()]
                result = max_percentage(total_people, languages_frequency)
                print('Case #: '.format(i, result))


                Now it's easy to test the code from the interactive interpreter or from a unit test, for example:



                >>> max_percentage(11, [1, 2, 3, 4])
                99


                and easy to measure its performance:



                >>> from timeit import timeit
                >>> timeit(lambda:max_percentage(14, ), number=1)
                8.237278351996792


                2. Performance



                As demonstrated above, the code in the post takes more than 8 seconds to figure out that when there are 14 voters (and no votes cast yet), the maximum rounded percentage is 101%.



                Why does the code in the post take so long to solve this problem? Well, it's because it carries out a search over all possible assignments of votes. But if there are $n$ voters then there are of the order of $$exppisqrt 2n over 3$$ possible assignments of votes (see the asymptotic formula for the partition numbers) and so the runtime is exponential in the number of votes.



                But in fact the problem is easy to solve by hand with a little bit of mathematics. If there are 14 voters, then each vote is worth $100 over 14 = 71over7$ percent, and so four votes are worth $284over7$ percent, which rounds up to 29. Fewer than four votes will round down, and more than four will be wasteful since the extra votes over four could be used to contribute to another block of four votes. So the maximum rounded percentage is found when we group as many of the votes as possible into blocks of four votes. In this case there are three such blocks, leaving two votes over, giving a rounded total of $29·3 + 14 = 101$.



                Similar analysis shows that if there are 19 voters, then each voter contributes $100over19 = 55over19$ percent, and so it is most efficient to put voters in pairs, because two votes contibute $1010over19$ percent, which rounds up to 11. So we have nine pairs of votes and one left over, giving a rounded total of $11·9 + 5 = 104$. But it will be a very long time before max_percentage(19, ) returns the result.



                So searching over all possible assignments of votes is not going to work even for moderately large problem sizes. Instead, you need to program the kind of mathematical analysis that I carried out in the examples above. I'm not going to spoil the problem for you by giving away my solution, but I'll just show one performance measurement:



                >>> timeit(lambda:max_percentage2(14, ), number=1)
                0.00011452200124040246


                This is more than 70,000 times faster than the code in the post.






                share|improve this answer













                1. Review



                It is hard to test this code because it gets its data from standard input.



                This means that in order to test it or measure its performance you have to write the test case to a file and then run the code with standard input redirected from that file. It would be easier to test if the code were refactored so that the input-reading part of the code was in a separate function from the problem-solving part, like this:



                from functools import reduce

                def gen_sums(total, freq, remaining):
                """Generate percentages' sums"""
                if not remaining:
                yield reduce(lambda x, y: x + (y*100+total//2)//total, freq, 0)
                else:
                seen = set()
                for i in range(len(freq)):
                if not freq[i] in seen:
                yield from gen_sums(total,
                freq[:i] + [freq[i]+1] + freq[i+1:],
                remaining-1)
                seen.add(freq[i])
                yield from gen_sums(total, freq+[1], remaining-1)

                def max_percentage(total_people, languages_frequency):
                if not 100 % total_people:
                return 100
                not_responded = total_people - sum(languages_frequency)
                return max(gen_sums(total_people, languages_frequency, not_responded))

                def main():
                T = int(input())
                for i in range(1, T+1):
                total_people, num_languages = map(int, input().split())
                languages_frequency = [int(x) for x in input().split()]
                result = max_percentage(total_people, languages_frequency)
                print('Case #: '.format(i, result))


                Now it's easy to test the code from the interactive interpreter or from a unit test, for example:



                >>> max_percentage(11, [1, 2, 3, 4])
                99


                and easy to measure its performance:



                >>> from timeit import timeit
                >>> timeit(lambda:max_percentage(14, ), number=1)
                8.237278351996792


                2. Performance



                As demonstrated above, the code in the post takes more than 8 seconds to figure out that when there are 14 voters (and no votes cast yet), the maximum rounded percentage is 101%.



                Why does the code in the post take so long to solve this problem? Well, it's because it carries out a search over all possible assignments of votes. But if there are $n$ voters then there are of the order of $$exppisqrt 2n over 3$$ possible assignments of votes (see the asymptotic formula for the partition numbers) and so the runtime is exponential in the number of votes.



                But in fact the problem is easy to solve by hand with a little bit of mathematics. If there are 14 voters, then each vote is worth $100 over 14 = 71over7$ percent, and so four votes are worth $284over7$ percent, which rounds up to 29. Fewer than four votes will round down, and more than four will be wasteful since the extra votes over four could be used to contribute to another block of four votes. So the maximum rounded percentage is found when we group as many of the votes as possible into blocks of four votes. In this case there are three such blocks, leaving two votes over, giving a rounded total of $29·3 + 14 = 101$.



                Similar analysis shows that if there are 19 voters, then each voter contributes $100over19 = 55over19$ percent, and so it is most efficient to put voters in pairs, because two votes contibute $1010over19$ percent, which rounds up to 11. So we have nine pairs of votes and one left over, giving a rounded total of $11·9 + 5 = 104$. But it will be a very long time before max_percentage(19, ) returns the result.



                So searching over all possible assignments of votes is not going to work even for moderately large problem sizes. Instead, you need to program the kind of mathematical analysis that I carried out in the examples above. I'm not going to spoil the problem for you by giving away my solution, but I'll just show one performance measurement:



                >>> timeit(lambda:max_percentage2(14, ), number=1)
                0.00011452200124040246


                This is more than 70,000 times faster than the code in the post.







                share|improve this answer













                share|improve this answer



                share|improve this answer











                answered May 8 at 8:19









                Gareth Rees

                41.1k394166




                41.1k394166






















                    up vote
                    -3
                    down vote













                    Obviously, if you don't save the path records, you've been, it's extremely time consuming. (Use DP)






                    share|improve this answer

















                    • 4




                      This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
                      – Vogel612♦
                      Apr 30 at 7:46










                    • How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
                      – Eugene Yarmash
                      Apr 30 at 9:07










                    • @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
                      – FreezePhoenix
                      Apr 30 at 13:22















                    up vote
                    -3
                    down vote













                    Obviously, if you don't save the path records, you've been, it's extremely time consuming. (Use DP)






                    share|improve this answer

















                    • 4




                      This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
                      – Vogel612♦
                      Apr 30 at 7:46










                    • How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
                      – Eugene Yarmash
                      Apr 30 at 9:07










                    • @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
                      – FreezePhoenix
                      Apr 30 at 13:22













                    up vote
                    -3
                    down vote










                    up vote
                    -3
                    down vote









                    Obviously, if you don't save the path records, you've been, it's extremely time consuming. (Use DP)






                    share|improve this answer













                    Obviously, if you don't save the path records, you've been, it's extremely time consuming. (Use DP)







                    share|improve this answer













                    share|improve this answer



                    share|improve this answer











                    answered Apr 30 at 4:07









                    user168507

                    31




                    31







                    • 4




                      This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
                      – Vogel612♦
                      Apr 30 at 7:46










                    • How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
                      – Eugene Yarmash
                      Apr 30 at 9:07










                    • @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
                      – FreezePhoenix
                      Apr 30 at 13:22













                    • 4




                      This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
                      – Vogel612♦
                      Apr 30 at 7:46










                    • How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
                      – Eugene Yarmash
                      Apr 30 at 9:07










                    • @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
                      – FreezePhoenix
                      Apr 30 at 13:22








                    4




                    4




                    This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
                    – Vogel612♦
                    Apr 30 at 7:46




                    This needs a bit more elaboration IMO. What do you mean with "Path records, you've been"? This might be good advice, but I would not understand where to apply it. Can you clarify?
                    – Vogel612♦
                    Apr 30 at 7:46












                    How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
                    – Eugene Yarmash
                    Apr 30 at 9:07




                    How exactly 'time consuming' is it? The 1st test case specifies 2 ≤ N ≤ 25 which seems like pretty low number.
                    – Eugene Yarmash
                    Apr 30 at 9:07












                    @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
                    – FreezePhoenix
                    Apr 30 at 13:22





                    @EugeneYarmash time = a(x**2) is the equation for your time, where x = the number N in your equation. It's always been this way with arrays.
                    – FreezePhoenix
                    Apr 30 at 13:22













                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193225%2fpython-solution-to-code-jams-rounding-error%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Chat program with C++ and SFML

                    Function to Return a JSON Like Objects Using VBA Collections and Arrays

                    Will my employers contract hold up in court?