trjhtr

Question

This is the Flipping coins problem on CodeChef:

There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:

Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"

Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".

Input:

The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.

Output:

Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.

Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?

n,q = map(int,input().split())
c = [0]*n
for i in range(q):
 j,k,l = list(map(int,input().split()))
 co = 0 
 if j == 1:
 #do Something
 for m in range(k,l+1):
 if c[m] == 1:
 co += 1
 print(co)
 if j == 0:
 #do something
 for m in range(k,l+1):
 if c[m] == 1:
 c[m] = 0
 else:
 c[m] = 1

You don't need to list before you unpack. Unpacking works with generators. — Jan 26 at 0:08
The if statement to flip the coins can be simplified to c[m] = 1 - c[m] — Jan 26 at 8:22

Austin Hastings 6,1591130 · Accepted Answer · 2018-01-25 21:50:26Z

The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.

You wrote:

n,q = map(int,input().split())
c = [0]*n
for i in range(q):
 j,k,l = list(map(int,input().split()))
 co = 0 
 if j == 1:
 #do Something
 for m in range(k,l+1):
 if c[m] == 1:
 co += 1
 print(co)
 if j == 0:
 #do something
 for m in range(k,l+1):
 if c[m] == 1:
 c[m] = 0
 else:
 c[m] = 1

When I read the assignment, this is what I saw:

There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:

1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...

Yet, when I look at your code, I see this:

for i in range(q):
 j,k,l = list(map(int,input().split()))

You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.

Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:

N, num_opns = list(map(int, input().split()))

for i in range(num_opns):
 operation, A, B = list(map(int, input().split()))

See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.

Adding Faster

Next, let's look at your two operations. Here's your first one, counting the 1 values:

for m in range(k,l+1):
 if c[m] == 1:
 co += 1

Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.

Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m] is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:

for m in range(k,l+1):
 co += c[m]

And get the same result!

Going further, it turns out that Python has a built-in sum that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:

co = sum(c[k:l+1])

There's two things going on in there. The first, c[k:l+1] is called a slicing expression. It creates a "sub-list" of the c list, with just the values in question. The second is the use of the sum() built-in.

Switching Faster

Your second operation just changes values:

for m in range(k,l+1):
 if c[m] == 1:
 c[m] = 0
 else:
 c[m] = 1

There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.

for m in range(k,l+1):
 c[m] ^= 1

Alternatively, you could store the values in a tuple and map the old value to the new value:

new_values = (1, 0)
for m in range(k,l+1):
 c[m] = new_values[c[m]]

Finally, I'll point you at the timeit module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit so you can keep track of what effect every change you make has on performance.

vnp 36.6k12991 · Answer 2 · 2018-01-25 21:01:16Z

Thou shall not brute force.

There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of 0 operations.

The key observation is that a 0 a b operation is in fact a pair of operations: flip coins from a to the end, and flip coins from b + 1 to the end. This leads to the solution:
- Decouple the 0 a b operation into a and b+1
- Maintain the list of flip points, and keep it ordered
- To perform a 1 operation, find the largest flip point before A. If its index in flip list is even, the A coin is tails up, otherwise it is heads up. Then, traverse the flip list until B adding up lengths of every other interval.

The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.

I'd be interested to see how you came up with O(QlogQ) as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ) — Jan 25 at 21:18
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(QÂ²)$. But that's better than $O(Q * N)$ in this case. — Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved. — Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation. — Jan 25 at 22:31

score 11 · Answer 3 · 2018-02-01 16:21:41Z

I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.

First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.

Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.

(iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.

There is an example implementation below. (Sorry, it's not pretty.)

Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.

Edit to add more detailed explanation:

The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.

This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).

The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.

For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)

First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.

For example, what should we do to flip the interval $[16,20) = I_2,4$?

First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)

Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)

Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)

Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums and flips. This term could be removed by using dictionaries instead, replacing the initialisation of sums and flips by

from collections import defaultdict
sums=[defaultdict(int) for i in range(n+1)]
flips=[defaultdict(bool) for i in range(n+1)]

but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.

from sys import stdin

def getline(): return map(int,stdin.readline().split())
N, Q = getline()
n=1
while (1<<n)<N: n+=1

sums=[[0]*((N>>i)+2) for i in range(n+1)]
flips=[[False]*((N>>i)+1) for i in range(n+1)]

# We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
#
# These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
# and the parent of I_i,j is I_i+1,j>>1.
#
# The parent interval I_i,j is the disjoint union of its two children.
# For example if i=3, j=5, then I_3,5=[40,48) is the union of 
# I_2,10=[40,44) and I_2,11=[44,48).
#
# The interval [0,R) decomposes into a disjoint union of standard intervals
# I_i,j according to the binary expansion of R:
# If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
# [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
#
# The collection of all ancestor nodes of all standard intervals in the decomposition
# of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
#
# We maintain the invariant:
# Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
# as (i',j') ranges over the ancestors of (i,j) in the tree.

def flip(R):# flip coins [0,R)
 m=0;M=1
 while M<=R:
 if R&M:# flip coins [(R-M)&-M, R&-M)
 t=(R-M)>>m
 sums[m][t]=M-sums[m][t]
 flips[m][t]=not flips[m][t]# invert flip status to make descendants work
 m+=1;M<<=1
 # Fix ancestors of changed nodes
 m=1;M=2
 while M<=N:
 R2=R>>m
 sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
 if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
 m+=1;M<<=1

def csum(R):# Sum over [0,R)
 s=0
 fl=False
 for m in range(n,-1,-1):
 M=1<<m
 if R&M:
 t=sums[m][(R-M)>>m]
 if fl: t=M-t
 s+=t
 fl=(fl!=flips[m][R>>m])# exclusive or
 return s

out=
for i in range(Q):
 op, A, B = getline()# operation, start, end (inclusive)
 B+=1
 if op==0: flip(A);flip(B)
 else: out.append(str(csum(B)-csum(A)))
print("n".join(out))

I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance. — Jan 26 at 10:43
It's clever code, but it's way too clever for someone like me! — Jan 26 at 11:00
I tried running your code in Python 3.5, adding a few performance enhancements (saving stdin.readline to a variable to avoid the lookup everytime, using list comprehensions for sums and flips, storing (R - M) >> m in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough... — Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code. — Jan 26 at 12:01

score 6 · Answer 4 · 2018-01-26 11:25:31Z

Since Codechef seems to support numpy, I would propose a solution like this:

import numpy as np

n, q = map(int, input().split())
coins = np.zeros(n, dtype=bool)

for _ in range(q):
 command, start, end = map(int, input().split())
 end += 1 # end is inclusive
 if command == 0:
 coins[start:end] = ~coins[start:end]
 elif command == 1:
 print(coins[start:end].sum())

This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).

I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).

I also added some whitespace, according to Python's official style-guide, PEP8.

This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).

import random

n, q = 100000, 100000
data = "n".join(" ".format(random.choice([0, 1]),
 *sorted([random.randrange(n),
 random.randrange(n)]))
 for _ in range(q))

with open("flipping_coins_input.dat", "w") as f:
 f.write(" n".format(n, q))
 f.write(data)

The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:

import numpy as np

n, q = map(int, input().split())
c = np.zeros(n, dtype=bool)

out = 
for _ in range(q):
 command, start, end = map(int, input().split())
 if command == 0:
 c[start:end + 1] = ~c[start:end + 1]
 elif command == 1:
 # print(c[start:end + 1].sum())
 out.append(str(c[start:end + 1].sum()))

print("n".join(out))

This shaves off about 10% (so it takes 2.7s).

Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the input() — Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully). — Jan 25 at 16:42
Then I would propose to skip the flipping entirely and just go with a %2 approach. Since you have everything set up, could you try it maybe? The code is here — Jan 25 at 16:43
@EricDuminil On mine it takes almost the full time (using time python3 -c "import numpy as np"). But cat flipping_coins.dat (so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s). — Jan 26 at 11:12

score 6 · Answer 5 · 2018-01-28 04:09:37Z

It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).

I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.

This is a table of timings

Program 1 = Graipher's numpy-based method

Program 2 = $O(Qlog(N))$ tree-based method

Program 3 = numpy-based method with blocks (below)

Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.


 Python 2 Python 3 pypy
Program 1 2.48 2.65 n/a
Program 2 1.58 2.16 0.29
Program 3 1.70 1.85 n/a

from sys import stdin
import numpy as np

def getline(): return map(int,stdin.readline().split())

N, Q = getline()

# c0 = bitmap of heads, divided into blocks of size 64
N1=N//64+1
c0 = np.zeros(N1, dtype=np.int64)

# c1 = number of set bits for each block
c1 = np.full(N1, 0, dtype=int)

# Number of set bits in 16 bit range
btc16=[0]
for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))

# Number of set bits in 64 bit range
def btc64(x):
 return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]

out = 
for _ in range(Q):
 command, start, end = getline()
 end+=1
 # Decompose start into block number, s1, and bit number within block, s0
 s1=start>>6; s0=start&63
 # Same for end
 e1=end>>6; e0=end&63
 if command == 0:
 if s1<e1:
 c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
 c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
 c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
 else:
 c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
 elif command == 1:
 if s1<e1:
 t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
 else:
 t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
 out.append(str(t))

print("n".join(out))

Austin Hastings 6,1591130 · Accepted Answer · 2018-01-25 21:50:26Z