Flipping coins performance

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
17
down vote

favorite
1












This is the Flipping coins problem on CodeChef:




There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:



  1. Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"


  2. Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".


Input:



The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.



Output:



Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.




Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?



n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1






share|improve this question

















  • 3




    It seems that some people succeeded using pypy
    – oliverpool
    Jan 25 at 16:39






  • 11




    #do Something is worse than no comment at all
    – Bailey Parker
    Jan 26 at 0:07






  • 1




    You don't need to list before you unpack. Unpacking works with generators.
    – Bailey Parker
    Jan 26 at 0:08






  • 1




    The if statement to flip the coins can be simplified to c[m] = 1 - c[m]
    – Barmar
    Jan 26 at 8:22






  • 1




    This screams for interval trees.
    – RobAu
    Jan 26 at 10:28
















up vote
17
down vote

favorite
1












This is the Flipping coins problem on CodeChef:




There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:



  1. Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"


  2. Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".


Input:



The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.



Output:



Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.




Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?



n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1






share|improve this question

















  • 3




    It seems that some people succeeded using pypy
    – oliverpool
    Jan 25 at 16:39






  • 11




    #do Something is worse than no comment at all
    – Bailey Parker
    Jan 26 at 0:07






  • 1




    You don't need to list before you unpack. Unpacking works with generators.
    – Bailey Parker
    Jan 26 at 0:08






  • 1




    The if statement to flip the coins can be simplified to c[m] = 1 - c[m]
    – Barmar
    Jan 26 at 8:22






  • 1




    This screams for interval trees.
    – RobAu
    Jan 26 at 10:28












up vote
17
down vote

favorite
1









up vote
17
down vote

favorite
1






1





This is the Flipping coins problem on CodeChef:




There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:



  1. Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"


  2. Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".


Input:



The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.



Output:



Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.




Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?



n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1






share|improve this question













This is the Flipping coins problem on CodeChef:




There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:



  1. Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"


  2. Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".


Input:



The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.



Output:



Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.




Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?



n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1








share|improve this question












share|improve this question




share|improve this question








edited Feb 1 at 17:24









200_success

123k14143401




123k14143401









asked Jan 25 at 16:03









Ashwin Sekhari

885




885







  • 3




    It seems that some people succeeded using pypy
    – oliverpool
    Jan 25 at 16:39






  • 11




    #do Something is worse than no comment at all
    – Bailey Parker
    Jan 26 at 0:07






  • 1




    You don't need to list before you unpack. Unpacking works with generators.
    – Bailey Parker
    Jan 26 at 0:08






  • 1




    The if statement to flip the coins can be simplified to c[m] = 1 - c[m]
    – Barmar
    Jan 26 at 8:22






  • 1




    This screams for interval trees.
    – RobAu
    Jan 26 at 10:28












  • 3




    It seems that some people succeeded using pypy
    – oliverpool
    Jan 25 at 16:39






  • 11




    #do Something is worse than no comment at all
    – Bailey Parker
    Jan 26 at 0:07






  • 1




    You don't need to list before you unpack. Unpacking works with generators.
    – Bailey Parker
    Jan 26 at 0:08






  • 1




    The if statement to flip the coins can be simplified to c[m] = 1 - c[m]
    – Barmar
    Jan 26 at 8:22






  • 1




    This screams for interval trees.
    – RobAu
    Jan 26 at 10:28







3




3




It seems that some people succeeded using pypy
– oliverpool
Jan 25 at 16:39




It seems that some people succeeded using pypy
– oliverpool
Jan 25 at 16:39




11




11




#do Something is worse than no comment at all
– Bailey Parker
Jan 26 at 0:07




#do Something is worse than no comment at all
– Bailey Parker
Jan 26 at 0:07




1




1




You don't need to list before you unpack. Unpacking works with generators.
– Bailey Parker
Jan 26 at 0:08




You don't need to list before you unpack. Unpacking works with generators.
– Bailey Parker
Jan 26 at 0:08




1




1




The if statement to flip the coins can be simplified to c[m] = 1 - c[m]
– Barmar
Jan 26 at 8:22




The if statement to flip the coins can be simplified to c[m] = 1 - c[m]
– Barmar
Jan 26 at 8:22




1




1




This screams for interval trees.
– RobAu
Jan 26 at 10:28




This screams for interval trees.
– RobAu
Jan 26 at 10:28










5 Answers
5






active

oldest

votes

















up vote
14
down vote



accepted










The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.



You wrote:



n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1


When I read the assignment, this is what I saw:




There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:



1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...




Yet, when I look at your code, I see this:



for i in range(q):
j,k,l = list(map(int,input().split()))


You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.



Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:



N, num_opns = list(map(int, input().split()))

for i in range(num_opns):
operation, A, B = list(map(int, input().split()))


See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.



Adding Faster



Next, let's look at your two operations. Here's your first one, counting the 1 values:



for m in range(k,l+1):
if c[m] == 1:
co += 1


Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.



Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m] is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:



for m in range(k,l+1):
co += c[m]


And get the same result!



Going further, it turns out that Python has a built-in sum that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:



co = sum(c[k:l+1])


There's two things going on in there. The first, c[k:l+1] is called a slicing expression. It creates a "sub-list" of the c list, with just the values in question. The second is the use of the sum() built-in.



Switching Faster



Your second operation just changes values:



for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1


There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.



for m in range(k,l+1):
c[m] ^= 1


Alternatively, you could store the values in a tuple and map the old value to the new value:



new_values = (1, 0)
for m in range(k,l+1):
c[m] = new_values[c[m]]


Finally, I'll point you at the timeit module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit so you can keep track of what effect every change you make has on performance.






share|improve this answer




























    up vote
    11
    down vote













    Thou shall not brute force.



    • There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of 0 operations.



    • The key observation is that a 0 a b operation is in fact a pair of operations: flip coins from a to the end, and flip coins from b + 1 to the end. This leads to the solution:



      • Decouple the 0 a b operation into a and b+1

      • Maintain the list of flip points, and keep it ordered

      • To perform a 1 operation, find the largest flip point before A. If its index in flip list is even, the A coin is tails up, otherwise it is heads up. Then, traverse the flip list until B adding up lengths of every other interval.


    The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.






    share|improve this answer





















    • I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
      – TemporalWolf
      Jan 25 at 21:18







    • 1




      The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
      – Austin Hastings
      Jan 25 at 21:37










    • Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
      – Graipher
      Jan 25 at 22:15










    • @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
      – TemporalWolf
      Jan 25 at 22:31






    • 1




      And how about writing some code?
      – Ev. Kounis
      Jan 26 at 9:04

















    up vote
    11
    down vote













    I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.



    First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.



    Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.



    (iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.



    There is an example implementation below. (Sorry, it's not pretty.)



    Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.



    Edit to add more detailed explanation:



    The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.



    This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).



    The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.



    For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)



    First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.



    For example, what should we do to flip the interval $[16,20) = I_2,4$?



    First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)



    Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)



    Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)



    Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums and flips. This term could be removed by using dictionaries instead, replacing the initialisation of sums and flips by



    from collections import defaultdict
    sums=[defaultdict(int) for i in range(n+1)]
    flips=[defaultdict(bool) for i in range(n+1)]


    but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.



    from sys import stdin

    def getline(): return map(int,stdin.readline().split())
    N, Q = getline()
    n=1
    while (1<<n)<N: n+=1

    sums=[[0]*((N>>i)+2) for i in range(n+1)]
    flips=[[False]*((N>>i)+1) for i in range(n+1)]

    # We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
    #
    # These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
    # and the parent of I_i,j is I_i+1,j>>1.
    #
    # The parent interval I_i,j is the disjoint union of its two children.
    # For example if i=3, j=5, then I_3,5=[40,48) is the union of
    # I_2,10=[40,44) and I_2,11=[44,48).
    #
    # The interval [0,R) decomposes into a disjoint union of standard intervals
    # I_i,j according to the binary expansion of R:
    # If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
    # [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
    #
    # The collection of all ancestor nodes of all standard intervals in the decomposition
    # of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
    #
    # We maintain the invariant:
    # Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
    # as (i',j') ranges over the ancestors of (i,j) in the tree.

    def flip(R):# flip coins [0,R)
    m=0;M=1
    while M<=R:
    if R&M:# flip coins [(R-M)&-M, R&-M)
    t=(R-M)>>m
    sums[m][t]=M-sums[m][t]
    flips[m][t]=not flips[m][t]# invert flip status to make descendants work
    m+=1;M<<=1
    # Fix ancestors of changed nodes
    m=1;M=2
    while M<=N:
    R2=R>>m
    sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
    if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
    m+=1;M<<=1

    def csum(R):# Sum over [0,R)
    s=0
    fl=False
    for m in range(n,-1,-1):
    M=1<<m
    if R&M:
    t=sums[m][(R-M)>>m]
    if fl: t=M-t
    s+=t
    fl=(fl!=flips[m][R>>m])# exclusive or
    return s

    out=
    for i in range(Q):
    op, A, B = getline()# operation, start, end (inclusive)
    B+=1
    if op==0: flip(A);flip(B)
    else: out.append(str(csum(B)-csum(A)))
    print("n".join(out))





    share|improve this answer



















    • 1




      I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
      – Alex Selby
      Jan 26 at 10:43






    • 2




      It's clever code, but it's way too clever for someone like me!
      – Gareth Rees
      Jan 26 at 11:00







    • 2




      I'll add more explanation.
      – Alex Selby
      Jan 26 at 11:05










    • I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
      – Graipher
      Jan 26 at 11:45











    • I've added a more detailed explanation with an example and links to the code.
      – Alex Selby
      Jan 26 at 12:01

















    up vote
    6
    down vote













    Since Codechef seems to support numpy, I would propose a solution like this:



    import numpy as np

    n, q = map(int, input().split())
    coins = np.zeros(n, dtype=bool)

    for _ in range(q):
    command, start, end = map(int, input().split())
    end += 1 # end is inclusive
    if command == 0:
    coins[start:end] = ~coins[start:end]
    elif command == 1:
    print(coins[start:end].sum())


    This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).



    I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).



    I also added some whitespace, according to Python's official style-guide, PEP8.



    This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).



    import random

    n, q = 100000, 100000
    data = "n".join(" ".format(random.choice([0, 1]),
    *sorted([random.randrange(n),
    random.randrange(n)]))
    for _ in range(q))

    with open("flipping_coins_input.dat", "w") as f:
    f.write(" n".format(n, q))
    f.write(data)



    The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:



    import numpy as np

    n, q = map(int, input().split())
    c = np.zeros(n, dtype=bool)

    out =
    for _ in range(q):
    command, start, end = map(int, input().split())
    if command == 0:
    c[start:end + 1] = ~c[start:end + 1]
    elif command == 1:
    # print(c[start:end + 1].sum())
    out.append(str(c[start:end + 1].sum()))

    print("n".join(out))


    This shaves off about 10% (so it takes 2.7s).






    share|improve this answer



















    • 1




      Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
      – Ev. Kounis
      Jan 25 at 16:39











    • @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
      – Graipher
      Jan 25 at 16:42






    • 1




      Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
      – Ev. Kounis
      Jan 25 at 16:43







    • 1




      Your solution still brute forces.
      – vnp
      Jan 25 at 21:02






    • 1




      @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
      – Graipher
      Jan 26 at 11:12

















    up vote
    6
    down vote













    It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).



    I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.



    This is a table of timings



    • Program 1 = Graipher's numpy-based method

    • Program 2 = $O(Qlog(N))$ tree-based method

    • Program 3 = numpy-based method with blocks (below)

    Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.




    Python 2 Python 3 pypy
    Program 1 2.48 2.65 n/a
    Program 2 1.58 2.16 0.29
    Program 3 1.70 1.85 n/a


    from sys import stdin
    import numpy as np

    def getline(): return map(int,stdin.readline().split())

    N, Q = getline()

    # c0 = bitmap of heads, divided into blocks of size 64
    N1=N//64+1
    c0 = np.zeros(N1, dtype=np.int64)

    # c1 = number of set bits for each block
    c1 = np.full(N1, 0, dtype=int)

    # Number of set bits in 16 bit range
    btc16=[0]
    for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))

    # Number of set bits in 64 bit range
    def btc64(x):
    return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]

    out =
    for _ in range(Q):
    command, start, end = getline()
    end+=1
    # Decompose start into block number, s1, and bit number within block, s0
    s1=start>>6; s0=start&63
    # Same for end
    e1=end>>6; e0=end&63
    if command == 0:
    if s1<e1:
    c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
    c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
    c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
    else:
    c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
    elif command == 1:
    if s1<e1:
    t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
    else:
    t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
    out.append(str(t))

    print("n".join(out))





    share|improve this answer























      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "196"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185980%2fflipping-coins-performance%23new-answer', 'question_page');

      );

      Post as a guest






























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      14
      down vote



      accepted










      The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.



      You wrote:



      n,q = map(int,input().split())
      c = [0]*n
      for i in range(q):
      j,k,l = list(map(int,input().split()))
      co = 0
      if j == 1:
      #do Something
      for m in range(k,l+1):
      if c[m] == 1:
      co += 1
      print(co)
      if j == 0:
      #do something
      for m in range(k,l+1):
      if c[m] == 1:
      c[m] = 0
      else:
      c[m] = 1


      When I read the assignment, this is what I saw:




      There are N coins kept on the table, numbered from 0 to N - 1.
      Initially, each coin is kept tails up. You have to perform two types
      of operations:



      1) Flip all coins numbered between A and B inclusive. This is
      represented by the command "0 A B" ...




      Yet, when I look at your code, I see this:



      for i in range(q):
      j,k,l = list(map(int,input().split()))


      You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.



      Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:



      N, num_opns = list(map(int, input().split()))

      for i in range(num_opns):
      operation, A, B = list(map(int, input().split()))


      See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.



      Adding Faster



      Next, let's look at your two operations. Here's your first one, counting the 1 values:



      for m in range(k,l+1):
      if c[m] == 1:
      co += 1


      Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.



      Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m] is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:



      for m in range(k,l+1):
      co += c[m]


      And get the same result!



      Going further, it turns out that Python has a built-in sum that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:



      co = sum(c[k:l+1])


      There's two things going on in there. The first, c[k:l+1] is called a slicing expression. It creates a "sub-list" of the c list, with just the values in question. The second is the use of the sum() built-in.



      Switching Faster



      Your second operation just changes values:



      for m in range(k,l+1):
      if c[m] == 1:
      c[m] = 0
      else:
      c[m] = 1


      There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.



      for m in range(k,l+1):
      c[m] ^= 1


      Alternatively, you could store the values in a tuple and map the old value to the new value:



      new_values = (1, 0)
      for m in range(k,l+1):
      c[m] = new_values[c[m]]


      Finally, I'll point you at the timeit module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit so you can keep track of what effect every change you make has on performance.






      share|improve this answer

























        up vote
        14
        down vote



        accepted










        The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.



        You wrote:



        n,q = map(int,input().split())
        c = [0]*n
        for i in range(q):
        j,k,l = list(map(int,input().split()))
        co = 0
        if j == 1:
        #do Something
        for m in range(k,l+1):
        if c[m] == 1:
        co += 1
        print(co)
        if j == 0:
        #do something
        for m in range(k,l+1):
        if c[m] == 1:
        c[m] = 0
        else:
        c[m] = 1


        When I read the assignment, this is what I saw:




        There are N coins kept on the table, numbered from 0 to N - 1.
        Initially, each coin is kept tails up. You have to perform two types
        of operations:



        1) Flip all coins numbered between A and B inclusive. This is
        represented by the command "0 A B" ...




        Yet, when I look at your code, I see this:



        for i in range(q):
        j,k,l = list(map(int,input().split()))


        You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.



        Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:



        N, num_opns = list(map(int, input().split()))

        for i in range(num_opns):
        operation, A, B = list(map(int, input().split()))


        See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.



        Adding Faster



        Next, let's look at your two operations. Here's your first one, counting the 1 values:



        for m in range(k,l+1):
        if c[m] == 1:
        co += 1


        Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.



        Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m] is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:



        for m in range(k,l+1):
        co += c[m]


        And get the same result!



        Going further, it turns out that Python has a built-in sum that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:



        co = sum(c[k:l+1])


        There's two things going on in there. The first, c[k:l+1] is called a slicing expression. It creates a "sub-list" of the c list, with just the values in question. The second is the use of the sum() built-in.



        Switching Faster



        Your second operation just changes values:



        for m in range(k,l+1):
        if c[m] == 1:
        c[m] = 0
        else:
        c[m] = 1


        There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.



        for m in range(k,l+1):
        c[m] ^= 1


        Alternatively, you could store the values in a tuple and map the old value to the new value:



        new_values = (1, 0)
        for m in range(k,l+1):
        c[m] = new_values[c[m]]


        Finally, I'll point you at the timeit module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit so you can keep track of what effect every change you make has on performance.






        share|improve this answer























          up vote
          14
          down vote



          accepted







          up vote
          14
          down vote



          accepted






          The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.



          You wrote:



          n,q = map(int,input().split())
          c = [0]*n
          for i in range(q):
          j,k,l = list(map(int,input().split()))
          co = 0
          if j == 1:
          #do Something
          for m in range(k,l+1):
          if c[m] == 1:
          co += 1
          print(co)
          if j == 0:
          #do something
          for m in range(k,l+1):
          if c[m] == 1:
          c[m] = 0
          else:
          c[m] = 1


          When I read the assignment, this is what I saw:




          There are N coins kept on the table, numbered from 0 to N - 1.
          Initially, each coin is kept tails up. You have to perform two types
          of operations:



          1) Flip all coins numbered between A and B inclusive. This is
          represented by the command "0 A B" ...




          Yet, when I look at your code, I see this:



          for i in range(q):
          j,k,l = list(map(int,input().split()))


          You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.



          Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:



          N, num_opns = list(map(int, input().split()))

          for i in range(num_opns):
          operation, A, B = list(map(int, input().split()))


          See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.



          Adding Faster



          Next, let's look at your two operations. Here's your first one, counting the 1 values:



          for m in range(k,l+1):
          if c[m] == 1:
          co += 1


          Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.



          Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m] is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:



          for m in range(k,l+1):
          co += c[m]


          And get the same result!



          Going further, it turns out that Python has a built-in sum that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:



          co = sum(c[k:l+1])


          There's two things going on in there. The first, c[k:l+1] is called a slicing expression. It creates a "sub-list" of the c list, with just the values in question. The second is the use of the sum() built-in.



          Switching Faster



          Your second operation just changes values:



          for m in range(k,l+1):
          if c[m] == 1:
          c[m] = 0
          else:
          c[m] = 1


          There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.



          for m in range(k,l+1):
          c[m] ^= 1


          Alternatively, you could store the values in a tuple and map the old value to the new value:



          new_values = (1, 0)
          for m in range(k,l+1):
          c[m] = new_values[c[m]]


          Finally, I'll point you at the timeit module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit so you can keep track of what effect every change you make has on performance.






          share|improve this answer













          The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.



          You wrote:



          n,q = map(int,input().split())
          c = [0]*n
          for i in range(q):
          j,k,l = list(map(int,input().split()))
          co = 0
          if j == 1:
          #do Something
          for m in range(k,l+1):
          if c[m] == 1:
          co += 1
          print(co)
          if j == 0:
          #do something
          for m in range(k,l+1):
          if c[m] == 1:
          c[m] = 0
          else:
          c[m] = 1


          When I read the assignment, this is what I saw:




          There are N coins kept on the table, numbered from 0 to N - 1.
          Initially, each coin is kept tails up. You have to perform two types
          of operations:



          1) Flip all coins numbered between A and B inclusive. This is
          represented by the command "0 A B" ...




          Yet, when I look at your code, I see this:



          for i in range(q):
          j,k,l = list(map(int,input().split()))


          You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.



          Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:



          N, num_opns = list(map(int, input().split()))

          for i in range(num_opns):
          operation, A, B = list(map(int, input().split()))


          See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.



          Adding Faster



          Next, let's look at your two operations. Here's your first one, counting the 1 values:



          for m in range(k,l+1):
          if c[m] == 1:
          co += 1


          Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.



          Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m] is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:



          for m in range(k,l+1):
          co += c[m]


          And get the same result!



          Going further, it turns out that Python has a built-in sum that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:



          co = sum(c[k:l+1])


          There's two things going on in there. The first, c[k:l+1] is called a slicing expression. It creates a "sub-list" of the c list, with just the values in question. The second is the use of the sum() built-in.



          Switching Faster



          Your second operation just changes values:



          for m in range(k,l+1):
          if c[m] == 1:
          c[m] = 0
          else:
          c[m] = 1


          There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.



          for m in range(k,l+1):
          c[m] ^= 1


          Alternatively, you could store the values in a tuple and map the old value to the new value:



          new_values = (1, 0)
          for m in range(k,l+1):
          c[m] = new_values[c[m]]


          Finally, I'll point you at the timeit module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit so you can keep track of what effect every change you make has on performance.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Jan 25 at 21:50









          Austin Hastings

          6,1591130




          6,1591130






















              up vote
              11
              down vote













              Thou shall not brute force.



              • There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of 0 operations.



              • The key observation is that a 0 a b operation is in fact a pair of operations: flip coins from a to the end, and flip coins from b + 1 to the end. This leads to the solution:



                • Decouple the 0 a b operation into a and b+1

                • Maintain the list of flip points, and keep it ordered

                • To perform a 1 operation, find the largest flip point before A. If its index in flip list is even, the A coin is tails up, otherwise it is heads up. Then, traverse the flip list until B adding up lengths of every other interval.


              The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.






              share|improve this answer





















              • I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
                – TemporalWolf
                Jan 25 at 21:18







              • 1




                The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
                – Austin Hastings
                Jan 25 at 21:37










              • Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
                – Graipher
                Jan 25 at 22:15










              • @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
                – TemporalWolf
                Jan 25 at 22:31






              • 1




                And how about writing some code?
                – Ev. Kounis
                Jan 26 at 9:04














              up vote
              11
              down vote













              Thou shall not brute force.



              • There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of 0 operations.



              • The key observation is that a 0 a b operation is in fact a pair of operations: flip coins from a to the end, and flip coins from b + 1 to the end. This leads to the solution:



                • Decouple the 0 a b operation into a and b+1

                • Maintain the list of flip points, and keep it ordered

                • To perform a 1 operation, find the largest flip point before A. If its index in flip list is even, the A coin is tails up, otherwise it is heads up. Then, traverse the flip list until B adding up lengths of every other interval.


              The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.






              share|improve this answer





















              • I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
                – TemporalWolf
                Jan 25 at 21:18







              • 1




                The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
                – Austin Hastings
                Jan 25 at 21:37










              • Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
                – Graipher
                Jan 25 at 22:15










              • @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
                – TemporalWolf
                Jan 25 at 22:31






              • 1




                And how about writing some code?
                – Ev. Kounis
                Jan 26 at 9:04












              up vote
              11
              down vote










              up vote
              11
              down vote









              Thou shall not brute force.



              • There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of 0 operations.



              • The key observation is that a 0 a b operation is in fact a pair of operations: flip coins from a to the end, and flip coins from b + 1 to the end. This leads to the solution:



                • Decouple the 0 a b operation into a and b+1

                • Maintain the list of flip points, and keep it ordered

                • To perform a 1 operation, find the largest flip point before A. If its index in flip list is even, the A coin is tails up, otherwise it is heads up. Then, traverse the flip list until B adding up lengths of every other interval.


              The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.






              share|improve this answer













              Thou shall not brute force.



              • There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of 0 operations.



              • The key observation is that a 0 a b operation is in fact a pair of operations: flip coins from a to the end, and flip coins from b + 1 to the end. This leads to the solution:



                • Decouple the 0 a b operation into a and b+1

                • Maintain the list of flip points, and keep it ordered

                • To perform a 1 operation, find the largest flip point before A. If its index in flip list is even, the A coin is tails up, otherwise it is heads up. Then, traverse the flip list until B adding up lengths of every other interval.


              The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.







              share|improve this answer













              share|improve this answer



              share|improve this answer











              answered Jan 25 at 21:01









              vnp

              36.6k12991




              36.6k12991











              • I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
                – TemporalWolf
                Jan 25 at 21:18







              • 1




                The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
                – Austin Hastings
                Jan 25 at 21:37










              • Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
                – Graipher
                Jan 25 at 22:15










              • @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
                – TemporalWolf
                Jan 25 at 22:31






              • 1




                And how about writing some code?
                – Ev. Kounis
                Jan 26 at 9:04
















              • I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
                – TemporalWolf
                Jan 25 at 21:18







              • 1




                The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
                – Austin Hastings
                Jan 25 at 21:37










              • Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
                – Graipher
                Jan 25 at 22:15










              • @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
                – TemporalWolf
                Jan 25 at 22:31






              • 1




                And how about writing some code?
                – Ev. Kounis
                Jan 26 at 9:04















              I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
              – TemporalWolf
              Jan 25 at 21:18





              I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
              – TemporalWolf
              Jan 25 at 21:18





              1




              1




              The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
              – Austin Hastings
              Jan 25 at 21:37




              The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
              – Austin Hastings
              Jan 25 at 21:37












              Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
              – Graipher
              Jan 25 at 22:15




              Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
              – Graipher
              Jan 25 at 22:15












              @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
              – TemporalWolf
              Jan 25 at 22:31




              @Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
              – TemporalWolf
              Jan 25 at 22:31




              1




              1




              And how about writing some code?
              – Ev. Kounis
              Jan 26 at 9:04




              And how about writing some code?
              – Ev. Kounis
              Jan 26 at 9:04










              up vote
              11
              down vote













              I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.



              First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.



              Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.



              (iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.



              There is an example implementation below. (Sorry, it's not pretty.)



              Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.



              Edit to add more detailed explanation:



              The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.



              This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).



              The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.



              For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)



              First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.



              For example, what should we do to flip the interval $[16,20) = I_2,4$?



              First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)



              Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)



              Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)



              Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums and flips. This term could be removed by using dictionaries instead, replacing the initialisation of sums and flips by



              from collections import defaultdict
              sums=[defaultdict(int) for i in range(n+1)]
              flips=[defaultdict(bool) for i in range(n+1)]


              but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.



              from sys import stdin

              def getline(): return map(int,stdin.readline().split())
              N, Q = getline()
              n=1
              while (1<<n)<N: n+=1

              sums=[[0]*((N>>i)+2) for i in range(n+1)]
              flips=[[False]*((N>>i)+1) for i in range(n+1)]

              # We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
              #
              # These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
              # and the parent of I_i,j is I_i+1,j>>1.
              #
              # The parent interval I_i,j is the disjoint union of its two children.
              # For example if i=3, j=5, then I_3,5=[40,48) is the union of
              # I_2,10=[40,44) and I_2,11=[44,48).
              #
              # The interval [0,R) decomposes into a disjoint union of standard intervals
              # I_i,j according to the binary expansion of R:
              # If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
              # [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
              #
              # The collection of all ancestor nodes of all standard intervals in the decomposition
              # of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
              #
              # We maintain the invariant:
              # Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
              # as (i',j') ranges over the ancestors of (i,j) in the tree.

              def flip(R):# flip coins [0,R)
              m=0;M=1
              while M<=R:
              if R&M:# flip coins [(R-M)&-M, R&-M)
              t=(R-M)>>m
              sums[m][t]=M-sums[m][t]
              flips[m][t]=not flips[m][t]# invert flip status to make descendants work
              m+=1;M<<=1
              # Fix ancestors of changed nodes
              m=1;M=2
              while M<=N:
              R2=R>>m
              sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
              if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
              m+=1;M<<=1

              def csum(R):# Sum over [0,R)
              s=0
              fl=False
              for m in range(n,-1,-1):
              M=1<<m
              if R&M:
              t=sums[m][(R-M)>>m]
              if fl: t=M-t
              s+=t
              fl=(fl!=flips[m][R>>m])# exclusive or
              return s

              out=
              for i in range(Q):
              op, A, B = getline()# operation, start, end (inclusive)
              B+=1
              if op==0: flip(A);flip(B)
              else: out.append(str(csum(B)-csum(A)))
              print("n".join(out))





              share|improve this answer



















              • 1




                I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
                – Alex Selby
                Jan 26 at 10:43






              • 2




                It's clever code, but it's way too clever for someone like me!
                – Gareth Rees
                Jan 26 at 11:00







              • 2




                I'll add more explanation.
                – Alex Selby
                Jan 26 at 11:05










              • I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
                – Graipher
                Jan 26 at 11:45











              • I've added a more detailed explanation with an example and links to the code.
                – Alex Selby
                Jan 26 at 12:01














              up vote
              11
              down vote













              I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.



              First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.



              Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.



              (iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.



              There is an example implementation below. (Sorry, it's not pretty.)



              Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.



              Edit to add more detailed explanation:



              The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.



              This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).



              The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.



              For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)



              First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.



              For example, what should we do to flip the interval $[16,20) = I_2,4$?



              First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)



              Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)



              Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)



              Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums and flips. This term could be removed by using dictionaries instead, replacing the initialisation of sums and flips by



              from collections import defaultdict
              sums=[defaultdict(int) for i in range(n+1)]
              flips=[defaultdict(bool) for i in range(n+1)]


              but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.



              from sys import stdin

              def getline(): return map(int,stdin.readline().split())
              N, Q = getline()
              n=1
              while (1<<n)<N: n+=1

              sums=[[0]*((N>>i)+2) for i in range(n+1)]
              flips=[[False]*((N>>i)+1) for i in range(n+1)]

              # We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
              #
              # These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
              # and the parent of I_i,j is I_i+1,j>>1.
              #
              # The parent interval I_i,j is the disjoint union of its two children.
              # For example if i=3, j=5, then I_3,5=[40,48) is the union of
              # I_2,10=[40,44) and I_2,11=[44,48).
              #
              # The interval [0,R) decomposes into a disjoint union of standard intervals
              # I_i,j according to the binary expansion of R:
              # If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
              # [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
              #
              # The collection of all ancestor nodes of all standard intervals in the decomposition
              # of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
              #
              # We maintain the invariant:
              # Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
              # as (i',j') ranges over the ancestors of (i,j) in the tree.

              def flip(R):# flip coins [0,R)
              m=0;M=1
              while M<=R:
              if R&M:# flip coins [(R-M)&-M, R&-M)
              t=(R-M)>>m
              sums[m][t]=M-sums[m][t]
              flips[m][t]=not flips[m][t]# invert flip status to make descendants work
              m+=1;M<<=1
              # Fix ancestors of changed nodes
              m=1;M=2
              while M<=N:
              R2=R>>m
              sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
              if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
              m+=1;M<<=1

              def csum(R):# Sum over [0,R)
              s=0
              fl=False
              for m in range(n,-1,-1):
              M=1<<m
              if R&M:
              t=sums[m][(R-M)>>m]
              if fl: t=M-t
              s+=t
              fl=(fl!=flips[m][R>>m])# exclusive or
              return s

              out=
              for i in range(Q):
              op, A, B = getline()# operation, start, end (inclusive)
              B+=1
              if op==0: flip(A);flip(B)
              else: out.append(str(csum(B)-csum(A)))
              print("n".join(out))





              share|improve this answer



















              • 1




                I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
                – Alex Selby
                Jan 26 at 10:43






              • 2




                It's clever code, but it's way too clever for someone like me!
                – Gareth Rees
                Jan 26 at 11:00







              • 2




                I'll add more explanation.
                – Alex Selby
                Jan 26 at 11:05










              • I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
                – Graipher
                Jan 26 at 11:45











              • I've added a more detailed explanation with an example and links to the code.
                – Alex Selby
                Jan 26 at 12:01












              up vote
              11
              down vote










              up vote
              11
              down vote









              I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.



              First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.



              Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.



              (iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.



              There is an example implementation below. (Sorry, it's not pretty.)



              Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.



              Edit to add more detailed explanation:



              The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.



              This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).



              The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.



              For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)



              First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.



              For example, what should we do to flip the interval $[16,20) = I_2,4$?



              First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)



              Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)



              Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)



              Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums and flips. This term could be removed by using dictionaries instead, replacing the initialisation of sums and flips by



              from collections import defaultdict
              sums=[defaultdict(int) for i in range(n+1)]
              flips=[defaultdict(bool) for i in range(n+1)]


              but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.



              from sys import stdin

              def getline(): return map(int,stdin.readline().split())
              N, Q = getline()
              n=1
              while (1<<n)<N: n+=1

              sums=[[0]*((N>>i)+2) for i in range(n+1)]
              flips=[[False]*((N>>i)+1) for i in range(n+1)]

              # We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
              #
              # These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
              # and the parent of I_i,j is I_i+1,j>>1.
              #
              # The parent interval I_i,j is the disjoint union of its two children.
              # For example if i=3, j=5, then I_3,5=[40,48) is the union of
              # I_2,10=[40,44) and I_2,11=[44,48).
              #
              # The interval [0,R) decomposes into a disjoint union of standard intervals
              # I_i,j according to the binary expansion of R:
              # If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
              # [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
              #
              # The collection of all ancestor nodes of all standard intervals in the decomposition
              # of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
              #
              # We maintain the invariant:
              # Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
              # as (i',j') ranges over the ancestors of (i,j) in the tree.

              def flip(R):# flip coins [0,R)
              m=0;M=1
              while M<=R:
              if R&M:# flip coins [(R-M)&-M, R&-M)
              t=(R-M)>>m
              sums[m][t]=M-sums[m][t]
              flips[m][t]=not flips[m][t]# invert flip status to make descendants work
              m+=1;M<<=1
              # Fix ancestors of changed nodes
              m=1;M=2
              while M<=N:
              R2=R>>m
              sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
              if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
              m+=1;M<<=1

              def csum(R):# Sum over [0,R)
              s=0
              fl=False
              for m in range(n,-1,-1):
              M=1<<m
              if R&M:
              t=sums[m][(R-M)>>m]
              if fl: t=M-t
              s+=t
              fl=(fl!=flips[m][R>>m])# exclusive or
              return s

              out=
              for i in range(Q):
              op, A, B = getline()# operation, start, end (inclusive)
              B+=1
              if op==0: flip(A);flip(B)
              else: out.append(str(csum(B)-csum(A)))
              print("n".join(out))





              share|improve this answer















              I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.



              First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.



              Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.



              (iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.



              There is an example implementation below. (Sorry, it's not pretty.)



              Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.



              Edit to add more detailed explanation:



              The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.



              This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).



              The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.



              For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)



              First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.



              For example, what should we do to flip the interval $[16,20) = I_2,4$?



              First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)



              Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)



              Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)



              Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums and flips. This term could be removed by using dictionaries instead, replacing the initialisation of sums and flips by



              from collections import defaultdict
              sums=[defaultdict(int) for i in range(n+1)]
              flips=[defaultdict(bool) for i in range(n+1)]


              but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.



              from sys import stdin

              def getline(): return map(int,stdin.readline().split())
              N, Q = getline()
              n=1
              while (1<<n)<N: n+=1

              sums=[[0]*((N>>i)+2) for i in range(n+1)]
              flips=[[False]*((N>>i)+1) for i in range(n+1)]

              # We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
              #
              # These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
              # and the parent of I_i,j is I_i+1,j>>1.
              #
              # The parent interval I_i,j is the disjoint union of its two children.
              # For example if i=3, j=5, then I_3,5=[40,48) is the union of
              # I_2,10=[40,44) and I_2,11=[44,48).
              #
              # The interval [0,R) decomposes into a disjoint union of standard intervals
              # I_i,j according to the binary expansion of R:
              # If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
              # [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
              #
              # The collection of all ancestor nodes of all standard intervals in the decomposition
              # of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
              #
              # We maintain the invariant:
              # Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
              # as (i',j') ranges over the ancestors of (i,j) in the tree.

              def flip(R):# flip coins [0,R)
              m=0;M=1
              while M<=R:
              if R&M:# flip coins [(R-M)&-M, R&-M)
              t=(R-M)>>m
              sums[m][t]=M-sums[m][t]
              flips[m][t]=not flips[m][t]# invert flip status to make descendants work
              m+=1;M<<=1
              # Fix ancestors of changed nodes
              m=1;M=2
              while M<=N:
              R2=R>>m
              sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
              if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
              m+=1;M<<=1

              def csum(R):# Sum over [0,R)
              s=0
              fl=False
              for m in range(n,-1,-1):
              M=1<<m
              if R&M:
              t=sums[m][(R-M)>>m]
              if fl: t=M-t
              s+=t
              fl=(fl!=flips[m][R>>m])# exclusive or
              return s

              out=
              for i in range(Q):
              op, A, B = getline()# operation, start, end (inclusive)
              B+=1
              if op==0: flip(A);flip(B)
              else: out.append(str(csum(B)-csum(A)))
              print("n".join(out))






              share|improve this answer















              share|improve this answer



              share|improve this answer








              edited Feb 1 at 16:21


























              answered Jan 26 at 9:54









              Alex Selby

              68127




              68127







              • 1




                I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
                – Alex Selby
                Jan 26 at 10:43






              • 2




                It's clever code, but it's way too clever for someone like me!
                – Gareth Rees
                Jan 26 at 11:00







              • 2




                I'll add more explanation.
                – Alex Selby
                Jan 26 at 11:05










              • I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
                – Graipher
                Jan 26 at 11:45











              • I've added a more detailed explanation with an example and links to the code.
                – Alex Selby
                Jan 26 at 12:01












              • 1




                I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
                – Alex Selby
                Jan 26 at 10:43






              • 2




                It's clever code, but it's way too clever for someone like me!
                – Gareth Rees
                Jan 26 at 11:00







              • 2




                I'll add more explanation.
                – Alex Selby
                Jan 26 at 11:05










              • I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
                – Graipher
                Jan 26 at 11:45











              • I've added a more detailed explanation with an example and links to the code.
                – Alex Selby
                Jan 26 at 12:01







              1




              1




              I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
              – Alex Selby
              Jan 26 at 10:43




              I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
              – Alex Selby
              Jan 26 at 10:43




              2




              2




              It's clever code, but it's way too clever for someone like me!
              – Gareth Rees
              Jan 26 at 11:00





              It's clever code, but it's way too clever for someone like me!
              – Gareth Rees
              Jan 26 at 11:00





              2




              2




              I'll add more explanation.
              – Alex Selby
              Jan 26 at 11:05




              I'll add more explanation.
              – Alex Selby
              Jan 26 at 11:05












              I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
              – Graipher
              Jan 26 at 11:45





              I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
              – Graipher
              Jan 26 at 11:45













              I've added a more detailed explanation with an example and links to the code.
              – Alex Selby
              Jan 26 at 12:01




              I've added a more detailed explanation with an example and links to the code.
              – Alex Selby
              Jan 26 at 12:01










              up vote
              6
              down vote













              Since Codechef seems to support numpy, I would propose a solution like this:



              import numpy as np

              n, q = map(int, input().split())
              coins = np.zeros(n, dtype=bool)

              for _ in range(q):
              command, start, end = map(int, input().split())
              end += 1 # end is inclusive
              if command == 0:
              coins[start:end] = ~coins[start:end]
              elif command == 1:
              print(coins[start:end].sum())


              This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).



              I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).



              I also added some whitespace, according to Python's official style-guide, PEP8.



              This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).



              import random

              n, q = 100000, 100000
              data = "n".join(" ".format(random.choice([0, 1]),
              *sorted([random.randrange(n),
              random.randrange(n)]))
              for _ in range(q))

              with open("flipping_coins_input.dat", "w") as f:
              f.write(" n".format(n, q))
              f.write(data)



              The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:



              import numpy as np

              n, q = map(int, input().split())
              c = np.zeros(n, dtype=bool)

              out =
              for _ in range(q):
              command, start, end = map(int, input().split())
              if command == 0:
              c[start:end + 1] = ~c[start:end + 1]
              elif command == 1:
              # print(c[start:end + 1].sum())
              out.append(str(c[start:end + 1].sum()))

              print("n".join(out))


              This shaves off about 10% (so it takes 2.7s).






              share|improve this answer



















              • 1




                Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
                – Ev. Kounis
                Jan 25 at 16:39











              • @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
                – Graipher
                Jan 25 at 16:42






              • 1




                Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
                – Ev. Kounis
                Jan 25 at 16:43







              • 1




                Your solution still brute forces.
                – vnp
                Jan 25 at 21:02






              • 1




                @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
                – Graipher
                Jan 26 at 11:12














              up vote
              6
              down vote













              Since Codechef seems to support numpy, I would propose a solution like this:



              import numpy as np

              n, q = map(int, input().split())
              coins = np.zeros(n, dtype=bool)

              for _ in range(q):
              command, start, end = map(int, input().split())
              end += 1 # end is inclusive
              if command == 0:
              coins[start:end] = ~coins[start:end]
              elif command == 1:
              print(coins[start:end].sum())


              This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).



              I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).



              I also added some whitespace, according to Python's official style-guide, PEP8.



              This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).



              import random

              n, q = 100000, 100000
              data = "n".join(" ".format(random.choice([0, 1]),
              *sorted([random.randrange(n),
              random.randrange(n)]))
              for _ in range(q))

              with open("flipping_coins_input.dat", "w") as f:
              f.write(" n".format(n, q))
              f.write(data)



              The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:



              import numpy as np

              n, q = map(int, input().split())
              c = np.zeros(n, dtype=bool)

              out =
              for _ in range(q):
              command, start, end = map(int, input().split())
              if command == 0:
              c[start:end + 1] = ~c[start:end + 1]
              elif command == 1:
              # print(c[start:end + 1].sum())
              out.append(str(c[start:end + 1].sum()))

              print("n".join(out))


              This shaves off about 10% (so it takes 2.7s).






              share|improve this answer



















              • 1




                Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
                – Ev. Kounis
                Jan 25 at 16:39











              • @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
                – Graipher
                Jan 25 at 16:42






              • 1




                Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
                – Ev. Kounis
                Jan 25 at 16:43







              • 1




                Your solution still brute forces.
                – vnp
                Jan 25 at 21:02






              • 1




                @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
                – Graipher
                Jan 26 at 11:12












              up vote
              6
              down vote










              up vote
              6
              down vote









              Since Codechef seems to support numpy, I would propose a solution like this:



              import numpy as np

              n, q = map(int, input().split())
              coins = np.zeros(n, dtype=bool)

              for _ in range(q):
              command, start, end = map(int, input().split())
              end += 1 # end is inclusive
              if command == 0:
              coins[start:end] = ~coins[start:end]
              elif command == 1:
              print(coins[start:end].sum())


              This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).



              I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).



              I also added some whitespace, according to Python's official style-guide, PEP8.



              This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).



              import random

              n, q = 100000, 100000
              data = "n".join(" ".format(random.choice([0, 1]),
              *sorted([random.randrange(n),
              random.randrange(n)]))
              for _ in range(q))

              with open("flipping_coins_input.dat", "w") as f:
              f.write(" n".format(n, q))
              f.write(data)



              The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:



              import numpy as np

              n, q = map(int, input().split())
              c = np.zeros(n, dtype=bool)

              out =
              for _ in range(q):
              command, start, end = map(int, input().split())
              if command == 0:
              c[start:end + 1] = ~c[start:end + 1]
              elif command == 1:
              # print(c[start:end + 1].sum())
              out.append(str(c[start:end + 1].sum()))

              print("n".join(out))


              This shaves off about 10% (so it takes 2.7s).






              share|improve this answer















              Since Codechef seems to support numpy, I would propose a solution like this:



              import numpy as np

              n, q = map(int, input().split())
              coins = np.zeros(n, dtype=bool)

              for _ in range(q):
              command, start, end = map(int, input().split())
              end += 1 # end is inclusive
              if command == 0:
              coins[start:end] = ~coins[start:end]
              elif command == 1:
              print(coins[start:end].sum())


              This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).



              I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).



              I also added some whitespace, according to Python's official style-guide, PEP8.



              This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).



              import random

              n, q = 100000, 100000
              data = "n".join(" ".format(random.choice([0, 1]),
              *sorted([random.randrange(n),
              random.randrange(n)]))
              for _ in range(q))

              with open("flipping_coins_input.dat", "w") as f:
              f.write(" n".format(n, q))
              f.write(data)



              The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:



              import numpy as np

              n, q = map(int, input().split())
              c = np.zeros(n, dtype=bool)

              out =
              for _ in range(q):
              command, start, end = map(int, input().split())
              if command == 0:
              c[start:end + 1] = ~c[start:end + 1]
              elif command == 1:
              # print(c[start:end + 1].sum())
              out.append(str(c[start:end + 1].sum()))

              print("n".join(out))


              This shaves off about 10% (so it takes 2.7s).







              share|improve this answer















              share|improve this answer



              share|improve this answer








              edited Jan 26 at 11:25


























              answered Jan 25 at 16:19









              Graipher

              20.5k43081




              20.5k43081







              • 1




                Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
                – Ev. Kounis
                Jan 25 at 16:39











              • @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
                – Graipher
                Jan 25 at 16:42






              • 1




                Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
                – Ev. Kounis
                Jan 25 at 16:43







              • 1




                Your solution still brute forces.
                – vnp
                Jan 25 at 21:02






              • 1




                @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
                – Graipher
                Jan 26 at 11:12












              • 1




                Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
                – Ev. Kounis
                Jan 25 at 16:39











              • @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
                – Graipher
                Jan 25 at 16:42






              • 1




                Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
                – Ev. Kounis
                Jan 25 at 16:43







              • 1




                Your solution still brute forces.
                – vnp
                Jan 25 at 21:02






              • 1




                @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
                – Graipher
                Jan 26 at 11:12







              1




              1




              Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
              – Ev. Kounis
              Jan 25 at 16:39





              Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input()
              – Ev. Kounis
              Jan 25 at 16:39













              @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
              – Graipher
              Jan 25 at 16:42




              @Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
              – Graipher
              Jan 25 at 16:42




              1




              1




              Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
              – Ev. Kounis
              Jan 25 at 16:43





              Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here
              – Ev. Kounis
              Jan 25 at 16:43





              1




              1




              Your solution still brute forces.
              – vnp
              Jan 25 at 21:02




              Your solution still brute forces.
              – vnp
              Jan 25 at 21:02




              1




              1




              @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
              – Graipher
              Jan 26 at 11:12




              @EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
              – Graipher
              Jan 26 at 11:12










              up vote
              6
              down vote













              It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).



              I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.



              This is a table of timings



              • Program 1 = Graipher's numpy-based method

              • Program 2 = $O(Qlog(N))$ tree-based method

              • Program 3 = numpy-based method with blocks (below)

              Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.




              Python 2 Python 3 pypy
              Program 1 2.48 2.65 n/a
              Program 2 1.58 2.16 0.29
              Program 3 1.70 1.85 n/a


              from sys import stdin
              import numpy as np

              def getline(): return map(int,stdin.readline().split())

              N, Q = getline()

              # c0 = bitmap of heads, divided into blocks of size 64
              N1=N//64+1
              c0 = np.zeros(N1, dtype=np.int64)

              # c1 = number of set bits for each block
              c1 = np.full(N1, 0, dtype=int)

              # Number of set bits in 16 bit range
              btc16=[0]
              for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))

              # Number of set bits in 64 bit range
              def btc64(x):
              return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]

              out =
              for _ in range(Q):
              command, start, end = getline()
              end+=1
              # Decompose start into block number, s1, and bit number within block, s0
              s1=start>>6; s0=start&63
              # Same for end
              e1=end>>6; e0=end&63
              if command == 0:
              if s1<e1:
              c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
              c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
              c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
              else:
              c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
              elif command == 1:
              if s1<e1:
              t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
              else:
              t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
              out.append(str(t))

              print("n".join(out))





              share|improve this answer



























                up vote
                6
                down vote













                It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).



                I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.



                This is a table of timings



                • Program 1 = Graipher's numpy-based method

                • Program 2 = $O(Qlog(N))$ tree-based method

                • Program 3 = numpy-based method with blocks (below)

                Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.




                Python 2 Python 3 pypy
                Program 1 2.48 2.65 n/a
                Program 2 1.58 2.16 0.29
                Program 3 1.70 1.85 n/a


                from sys import stdin
                import numpy as np

                def getline(): return map(int,stdin.readline().split())

                N, Q = getline()

                # c0 = bitmap of heads, divided into blocks of size 64
                N1=N//64+1
                c0 = np.zeros(N1, dtype=np.int64)

                # c1 = number of set bits for each block
                c1 = np.full(N1, 0, dtype=int)

                # Number of set bits in 16 bit range
                btc16=[0]
                for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))

                # Number of set bits in 64 bit range
                def btc64(x):
                return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]

                out =
                for _ in range(Q):
                command, start, end = getline()
                end+=1
                # Decompose start into block number, s1, and bit number within block, s0
                s1=start>>6; s0=start&63
                # Same for end
                e1=end>>6; e0=end&63
                if command == 0:
                if s1<e1:
                c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
                c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
                c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
                else:
                c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
                elif command == 1:
                if s1<e1:
                t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
                else:
                t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
                out.append(str(t))

                print("n".join(out))





                share|improve this answer

























                  up vote
                  6
                  down vote










                  up vote
                  6
                  down vote









                  It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).



                  I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.



                  This is a table of timings



                  • Program 1 = Graipher's numpy-based method

                  • Program 2 = $O(Qlog(N))$ tree-based method

                  • Program 3 = numpy-based method with blocks (below)

                  Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.




                  Python 2 Python 3 pypy
                  Program 1 2.48 2.65 n/a
                  Program 2 1.58 2.16 0.29
                  Program 3 1.70 1.85 n/a


                  from sys import stdin
                  import numpy as np

                  def getline(): return map(int,stdin.readline().split())

                  N, Q = getline()

                  # c0 = bitmap of heads, divided into blocks of size 64
                  N1=N//64+1
                  c0 = np.zeros(N1, dtype=np.int64)

                  # c1 = number of set bits for each block
                  c1 = np.full(N1, 0, dtype=int)

                  # Number of set bits in 16 bit range
                  btc16=[0]
                  for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))

                  # Number of set bits in 64 bit range
                  def btc64(x):
                  return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]

                  out =
                  for _ in range(Q):
                  command, start, end = getline()
                  end+=1
                  # Decompose start into block number, s1, and bit number within block, s0
                  s1=start>>6; s0=start&63
                  # Same for end
                  e1=end>>6; e0=end&63
                  if command == 0:
                  if s1<e1:
                  c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
                  c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
                  c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
                  else:
                  c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
                  elif command == 1:
                  if s1<e1:
                  t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
                  else:
                  t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
                  out.append(str(t))

                  print("n".join(out))





                  share|improve this answer















                  It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).



                  I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.



                  This is a table of timings



                  • Program 1 = Graipher's numpy-based method

                  • Program 2 = $O(Qlog(N))$ tree-based method

                  • Program 3 = numpy-based method with blocks (below)

                  Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.




                  Python 2 Python 3 pypy
                  Program 1 2.48 2.65 n/a
                  Program 2 1.58 2.16 0.29
                  Program 3 1.70 1.85 n/a


                  from sys import stdin
                  import numpy as np

                  def getline(): return map(int,stdin.readline().split())

                  N, Q = getline()

                  # c0 = bitmap of heads, divided into blocks of size 64
                  N1=N//64+1
                  c0 = np.zeros(N1, dtype=np.int64)

                  # c1 = number of set bits for each block
                  c1 = np.full(N1, 0, dtype=int)

                  # Number of set bits in 16 bit range
                  btc16=[0]
                  for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))

                  # Number of set bits in 64 bit range
                  def btc64(x):
                  return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]

                  out =
                  for _ in range(Q):
                  command, start, end = getline()
                  end+=1
                  # Decompose start into block number, s1, and bit number within block, s0
                  s1=start>>6; s0=start&63
                  # Same for end
                  e1=end>>6; e0=end&63
                  if command == 0:
                  if s1<e1:
                  c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
                  c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
                  c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
                  else:
                  c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
                  elif command == 1:
                  if s1<e1:
                  t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
                  else:
                  t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
                  out.append(str(t))

                  print("n".join(out))






                  share|improve this answer















                  share|improve this answer



                  share|improve this answer








                  edited Jan 28 at 4:09


























                  answered Jan 27 at 18:51









                  Alex Selby

                  68127




                  68127






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185980%2fflipping-coins-performance%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      Chat program with C++ and SFML

                      Function to Return a JSON Like Objects Using VBA Collections and Arrays

                      Will my employers contract hold up in court?