Flipping coins performance
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
17
down vote
favorite
This is the Flipping coins problem on CodeChef:
There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:
Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"
Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".
Input:
The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.
Output:
Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.
Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
python python-3.x programming-challenge time-limit-exceeded interval
 |Â
show 2 more comments
up vote
17
down vote
favorite
This is the Flipping coins problem on CodeChef:
There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:
Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"
Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".
Input:
The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.
Output:
Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.
Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
python python-3.x programming-challenge time-limit-exceeded interval
3
It seems that some people succeeded usingpypy
â oliverpool
Jan 25 at 16:39
11
#do Something
is worse than no comment at all
â Bailey Parker
Jan 26 at 0:07
1
You don't need tolist
before you unpack. Unpacking works with generators.
â Bailey Parker
Jan 26 at 0:08
1
Theif
statement to flip the coins can be simplified toc[m] = 1 - c[m]
â Barmar
Jan 26 at 8:22
1
This screams for interval trees.
â RobAu
Jan 26 at 10:28
 |Â
show 2 more comments
up vote
17
down vote
favorite
up vote
17
down vote
favorite
This is the Flipping coins problem on CodeChef:
There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:
Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"
Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".
Input:
The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.
Output:
Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.
Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
python python-3.x programming-challenge time-limit-exceeded interval
This is the Flipping coins problem on CodeChef:
There are $N$ coins kept on the table, numbered from $0$ to $N - 1$. Initially, each coin is kept tails up. You have to perform two types of operations:
Flip all coins numbered between $A$ and $B$ inclusive. This is represented by the command "0 A B"
Answer how many coins numbered between $A$ and $B$ inclusive are heads up. This is represented by the command "1 A B".
Input:
The first line contains two integers, $N$ and $Q$. Each of the next $Q$ lines are either of the form "0 A B" or "1 A B" as mentioned above.
Output:
Output 1 line for each of the queries of the form "1 A B" containing the required answer for the corresponding query.
Out of all the submissions in Python 3.5, none was able to solve it in the provided time limit (probably 0.3s, it is not specified exactly). How can I optimize my code?
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
python python-3.x programming-challenge time-limit-exceeded interval
edited Feb 1 at 17:24
200_success
123k14143401
123k14143401
asked Jan 25 at 16:03
Ashwin Sekhari
885
885
3
It seems that some people succeeded usingpypy
â oliverpool
Jan 25 at 16:39
11
#do Something
is worse than no comment at all
â Bailey Parker
Jan 26 at 0:07
1
You don't need tolist
before you unpack. Unpacking works with generators.
â Bailey Parker
Jan 26 at 0:08
1
Theif
statement to flip the coins can be simplified toc[m] = 1 - c[m]
â Barmar
Jan 26 at 8:22
1
This screams for interval trees.
â RobAu
Jan 26 at 10:28
 |Â
show 2 more comments
3
It seems that some people succeeded usingpypy
â oliverpool
Jan 25 at 16:39
11
#do Something
is worse than no comment at all
â Bailey Parker
Jan 26 at 0:07
1
You don't need tolist
before you unpack. Unpacking works with generators.
â Bailey Parker
Jan 26 at 0:08
1
Theif
statement to flip the coins can be simplified toc[m] = 1 - c[m]
â Barmar
Jan 26 at 8:22
1
This screams for interval trees.
â RobAu
Jan 26 at 10:28
3
3
It seems that some people succeeded using
pypy
â oliverpool
Jan 25 at 16:39
It seems that some people succeeded using
pypy
â oliverpool
Jan 25 at 16:39
11
11
#do Something
is worse than no comment at allâ Bailey Parker
Jan 26 at 0:07
#do Something
is worse than no comment at allâ Bailey Parker
Jan 26 at 0:07
1
1
You don't need to
list
before you unpack. Unpacking works with generators.â Bailey Parker
Jan 26 at 0:08
You don't need to
list
before you unpack. Unpacking works with generators.â Bailey Parker
Jan 26 at 0:08
1
1
The
if
statement to flip the coins can be simplified to c[m] = 1 - c[m]
â Barmar
Jan 26 at 8:22
The
if
statement to flip the coins can be simplified to c[m] = 1 - c[m]
â Barmar
Jan 26 at 8:22
1
1
This screams for interval trees.
â RobAu
Jan 26 at 10:28
This screams for interval trees.
â RobAu
Jan 26 at 10:28
 |Â
show 2 more comments
5 Answers
5
active
oldest
votes
up vote
14
down vote
accepted
The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.
You wrote:
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
When I read the assignment, this is what I saw:
There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:
1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...
Yet, when I look at your code, I see this:
for i in range(q):
j,k,l = list(map(int,input().split()))
You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.
Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:
N, num_opns = list(map(int, input().split()))
for i in range(num_opns):
operation, A, B = list(map(int, input().split()))
See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.
Adding Faster
Next, let's look at your two operations. Here's your first one, counting the 1 values:
for m in range(k,l+1):
if c[m] == 1:
co += 1
Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.
Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m]
is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:
for m in range(k,l+1):
co += c[m]
And get the same result!
Going further, it turns out that Python has a built-in sum
that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:
co = sum(c[k:l+1])
There's two things going on in there. The first, c[k:l+1]
is called a slicing expression. It creates a "sub-list" of the c
list, with just the values in question. The second is the use of the sum()
built-in.
Switching Faster
Your second operation just changes values:
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if
statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.
for m in range(k,l+1):
c[m] ^= 1
Alternatively, you could store the values in a tuple and map the old value to the new value:
new_values = (1, 0)
for m in range(k,l+1):
c[m] = new_values[c[m]]
Finally, I'll point you at the timeit
module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit
so you can keep track of what effect every change you make has on performance.
add a comment |Â
up vote
11
down vote
Thou shall not brute force.
There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of
0
operations.The key observation is that a
0 a b
operation is in fact a pair of operations: flip coins froma
to the end, and flip coins fromb + 1
to the end. This leads to the solution:- Decouple the
0 a b
operation intoa
andb+1
- Maintain the list of flip points, and keep it ordered
- To perform a
1
operation, find the largest flip point beforeA
. If its index in flip list is even, theA
coin is tails up, otherwise it is heads up. Then, traverse the flip list untilB
adding up lengths of every other interval.
- Decouple the
The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.
I'd be interested to see how you came up withO(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst caseO(logQ)
â TemporalWolf
Jan 25 at 21:18
1
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
1
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
 |Â
show 4 more comments
up vote
11
down vote
I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.
First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.
Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.
(iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.
There is an example implementation below. (Sorry, it's not pretty.)
Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.
Edit to add more detailed explanation:
The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.
This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).
The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.
For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)
First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.
For example, what should we do to flip the interval $[16,20) = I_2,4$?
First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)
Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)
Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)
Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums
and flips
. This term could be removed by using dictionaries instead, replacing the initialisation of sums
and flips
by
from collections import defaultdict
sums=[defaultdict(int) for i in range(n+1)]
flips=[defaultdict(bool) for i in range(n+1)]
but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.
from sys import stdin
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
n=1
while (1<<n)<N: n+=1
sums=[[0]*((N>>i)+2) for i in range(n+1)]
flips=[[False]*((N>>i)+1) for i in range(n+1)]
# We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
#
# These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
# and the parent of I_i,j is I_i+1,j>>1.
#
# The parent interval I_i,j is the disjoint union of its two children.
# For example if i=3, j=5, then I_3,5=[40,48) is the union of
# I_2,10=[40,44) and I_2,11=[44,48).
#
# The interval [0,R) decomposes into a disjoint union of standard intervals
# I_i,j according to the binary expansion of R:
# If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
# [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
#
# The collection of all ancestor nodes of all standard intervals in the decomposition
# of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
#
# We maintain the invariant:
# Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
# as (i',j') ranges over the ancestors of (i,j) in the tree.
def flip(R):# flip coins [0,R)
m=0;M=1
while M<=R:
if R&M:# flip coins [(R-M)&-M, R&-M)
t=(R-M)>>m
sums[m][t]=M-sums[m][t]
flips[m][t]=not flips[m][t]# invert flip status to make descendants work
m+=1;M<<=1
# Fix ancestors of changed nodes
m=1;M=2
while M<=N:
R2=R>>m
sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
m+=1;M<<=1
def csum(R):# Sum over [0,R)
s=0
fl=False
for m in range(n,-1,-1):
M=1<<m
if R&M:
t=sums[m][(R-M)>>m]
if fl: t=M-t
s+=t
fl=(fl!=flips[m][R>>m])# exclusive or
return s
out=
for i in range(Q):
op, A, B = getline()# operation, start, end (inclusive)
B+=1
if op==0: flip(A);flip(B)
else: out.append(str(csum(B)-csum(A)))
print("n".join(out))
1
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
2
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
2
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I tried running your code in Python 3.5, adding a few performance enhancements (savingstdin.readline
to a variable to avoid the lookup everytime, using list comprehensions forsums
andflips
, storing(R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
â Graipher
Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
 |Â
show 1 more comment
up vote
6
down vote
Since Codechef seems to support numpy
, I would propose a solution like this:
import numpy as np
n, q = map(int, input().split())
coins = np.zeros(n, dtype=bool)
for _ in range(q):
command, start, end = map(int, input().split())
end += 1 # end is inclusive
if command == 0:
coins[start:end] = ~coins[start:end]
elif command == 1:
print(coins[start:end].sum())
This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~
. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).
I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).
I also added some whitespace, according to Python's official style-guide, PEP8.
This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).
import random
n, q = 100000, 100000
data = "n".join(" ".format(random.choice([0, 1]),
*sorted([random.randrange(n),
random.randrange(n)]))
for _ in range(q))
with open("flipping_coins_input.dat", "w") as f:
f.write(" n".format(n, q))
f.write(data)
The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:
import numpy as np
n, q = map(int, input().split())
c = np.zeros(n, dtype=bool)
out =
for _ in range(q):
command, start, end = map(int, input().split())
if command == 0:
c[start:end + 1] = ~c[start:end + 1]
elif command == 1:
# print(c[start:end + 1].sum())
out.append(str(c[start:end + 1].sum()))
print("n".join(out))
This shaves off about 10% (so it takes 2.7s).
1
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to theinput()
â Ev. Kounis
Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
1
Then I would propose to skip the flipping entirely and just go with a%2
approach. Since you have everything set up, could you try it maybe? The code is here
â Ev. Kounis
Jan 25 at 16:43
1
Your solution still brute forces.
â vnp
Jan 25 at 21:02
1
@EricDuminil On mine it takes almost the full time (usingtime python3 -c "import numpy as np"
). Butcat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
â Graipher
Jan 26 at 11:12
 |Â
show 6 more comments
up vote
6
down vote
It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).
I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.
This is a table of timings
- Program 1 = Graipher's numpy-based method
- Program 2 = $O(Qlog(N))$ tree-based method
- Program 3 = numpy-based method with blocks (below)
Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.
Python 2 Python 3 pypy
Program 1 2.48 2.65 n/a
Program 2 1.58 2.16 0.29
Program 3 1.70 1.85 n/a
from sys import stdin
import numpy as np
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
# c0 = bitmap of heads, divided into blocks of size 64
N1=N//64+1
c0 = np.zeros(N1, dtype=np.int64)
# c1 = number of set bits for each block
c1 = np.full(N1, 0, dtype=int)
# Number of set bits in 16 bit range
btc16=[0]
for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))
# Number of set bits in 64 bit range
def btc64(x):
return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]
out =
for _ in range(Q):
command, start, end = getline()
end+=1
# Decompose start into block number, s1, and bit number within block, s0
s1=start>>6; s0=start&63
# Same for end
e1=end>>6; e0=end&63
if command == 0:
if s1<e1:
c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
else:
c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
elif command == 1:
if s1<e1:
t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
else:
t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
out.append(str(t))
print("n".join(out))
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
14
down vote
accepted
The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.
You wrote:
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
When I read the assignment, this is what I saw:
There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:
1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...
Yet, when I look at your code, I see this:
for i in range(q):
j,k,l = list(map(int,input().split()))
You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.
Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:
N, num_opns = list(map(int, input().split()))
for i in range(num_opns):
operation, A, B = list(map(int, input().split()))
See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.
Adding Faster
Next, let's look at your two operations. Here's your first one, counting the 1 values:
for m in range(k,l+1):
if c[m] == 1:
co += 1
Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.
Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m]
is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:
for m in range(k,l+1):
co += c[m]
And get the same result!
Going further, it turns out that Python has a built-in sum
that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:
co = sum(c[k:l+1])
There's two things going on in there. The first, c[k:l+1]
is called a slicing expression. It creates a "sub-list" of the c
list, with just the values in question. The second is the use of the sum()
built-in.
Switching Faster
Your second operation just changes values:
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if
statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.
for m in range(k,l+1):
c[m] ^= 1
Alternatively, you could store the values in a tuple and map the old value to the new value:
new_values = (1, 0)
for m in range(k,l+1):
c[m] = new_values[c[m]]
Finally, I'll point you at the timeit
module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit
so you can keep track of what effect every change you make has on performance.
add a comment |Â
up vote
14
down vote
accepted
The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.
You wrote:
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
When I read the assignment, this is what I saw:
There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:
1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...
Yet, when I look at your code, I see this:
for i in range(q):
j,k,l = list(map(int,input().split()))
You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.
Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:
N, num_opns = list(map(int, input().split()))
for i in range(num_opns):
operation, A, B = list(map(int, input().split()))
See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.
Adding Faster
Next, let's look at your two operations. Here's your first one, counting the 1 values:
for m in range(k,l+1):
if c[m] == 1:
co += 1
Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.
Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m]
is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:
for m in range(k,l+1):
co += c[m]
And get the same result!
Going further, it turns out that Python has a built-in sum
that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:
co = sum(c[k:l+1])
There's two things going on in there. The first, c[k:l+1]
is called a slicing expression. It creates a "sub-list" of the c
list, with just the values in question. The second is the use of the sum()
built-in.
Switching Faster
Your second operation just changes values:
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if
statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.
for m in range(k,l+1):
c[m] ^= 1
Alternatively, you could store the values in a tuple and map the old value to the new value:
new_values = (1, 0)
for m in range(k,l+1):
c[m] = new_values[c[m]]
Finally, I'll point you at the timeit
module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit
so you can keep track of what effect every change you make has on performance.
add a comment |Â
up vote
14
down vote
accepted
up vote
14
down vote
accepted
The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.
You wrote:
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
When I read the assignment, this is what I saw:
There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:
1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...
Yet, when I look at your code, I see this:
for i in range(q):
j,k,l = list(map(int,input().split()))
You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.
Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:
N, num_opns = list(map(int, input().split()))
for i in range(num_opns):
operation, A, B = list(map(int, input().split()))
See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.
Adding Faster
Next, let's look at your two operations. Here's your first one, counting the 1 values:
for m in range(k,l+1):
if c[m] == 1:
co += 1
Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.
Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m]
is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:
for m in range(k,l+1):
co += c[m]
And get the same result!
Going further, it turns out that Python has a built-in sum
that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:
co = sum(c[k:l+1])
There's two things going on in there. The first, c[k:l+1]
is called a slicing expression. It creates a "sub-list" of the c
list, with just the values in question. The second is the use of the sum()
built-in.
Switching Faster
Your second operation just changes values:
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if
statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.
for m in range(k,l+1):
c[m] ^= 1
Alternatively, you could store the values in a tuple and map the old value to the new value:
new_values = (1, 0)
for m in range(k,l+1):
c[m] = new_values[c[m]]
Finally, I'll point you at the timeit
module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit
so you can keep track of what effect every change you make has on performance.
The code you have posted is straightforward and basic. Because of this, I'm going to assume you are not super-experienced with Python. I'll make some direct suggestions to go from your current code to a faster version of your code, but I don't expect you'll be able to make the time limit with this structure. See @vnp's answer for how to do that.
You wrote:
n,q = map(int,input().split())
c = [0]*n
for i in range(q):
j,k,l = list(map(int,input().split()))
co = 0
if j == 1:
#do Something
for m in range(k,l+1):
if c[m] == 1:
co += 1
print(co)
if j == 0:
#do something
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
When I read the assignment, this is what I saw:
There are N coins kept on the table, numbered from 0 to N - 1.
Initially, each coin is kept tails up. You have to perform two types
of operations:
1) Flip all coins numbered between A and B inclusive. This is
represented by the command "0 A B" ...
Yet, when I look at your code, I see this:
for i in range(q):
j,k,l = list(map(int,input().split()))
You aren't mapping the problem space into your code. This makes it harder for you- and me, and everyone else that sees it- to reason about your code.
Let me suggest right up front that you adopt names that reflect the problem statement whereever possible. Note that in cases where you get clever and use a different algorithm, the names are entirely up to you. In that case, they should not conflict with the problem statement:
N, num_opns = list(map(int, input().split()))
for i in range(num_opns):
operation, A, B = list(map(int, input().split()))
See the difference? N is N. There are operations that can be 0 or 1. Guess which is the operation? And there are parameters A and B, clearly marked. I haven't changed how it functions, but someone reading the code can now refer to the instructions and see what's what.
Adding Faster
Next, let's look at your two operations. Here's your first one, counting the 1 values:
for m in range(k,l+1):
if c[m] == 1:
co += 1
Your code looks at every value, and if it's 1, adds 1 to the count, and if it's not 1, adds nothing to the count.
Now, think like a programmer! Ask, "how can I get rid of any of that code?" In this case, what happens when c[m]
is not equal to 1? Why, then, it's equal to 0. And what happens when you add 0 to any number? Nothing - the number doesn't change. So you could change that code to:
for m in range(k,l+1):
co += c[m]
And get the same result!
Going further, it turns out that Python has a built-in sum
that adds up a sequence of values! You can switch from adding these numbers in Python to having Python's interpreter add the numbers in C. That should speed things up nicely:
co = sum(c[k:l+1])
There's two things going on in there. The first, c[k:l+1]
is called a slicing expression. It creates a "sub-list" of the c
list, with just the values in question. The second is the use of the sum()
built-in.
Switching Faster
Your second operation just changes values:
for m in range(k,l+1):
if c[m] == 1:
c[m] = 0
else:
c[m] = 1
There are a couple of ways to speed this up, and they all involve doing some kind of single operation instead of an if
statement. For example, you could perform a bitwise exclusive-or of the value with 1. Bitwise xor is also bitwise (no borrow) subtraction modulo 2, so when you xor things 0 becomes 1, and 1 becomes 0.
for m in range(k,l+1):
c[m] ^= 1
Alternatively, you could store the values in a tuple and map the old value to the new value:
new_values = (1, 0)
for m in range(k,l+1):
c[m] = new_values[c[m]]
Finally, I'll point you at the timeit
module, which is the standard Python way of measuring speed. If you are trying to beat a time limit in any challenge, make sure you use timeit
so you can keep track of what effect every change you make has on performance.
answered Jan 25 at 21:50
Austin Hastings
6,1591130
6,1591130
add a comment |Â
add a comment |Â
up vote
11
down vote
Thou shall not brute force.
There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of
0
operations.The key observation is that a
0 a b
operation is in fact a pair of operations: flip coins froma
to the end, and flip coins fromb + 1
to the end. This leads to the solution:- Decouple the
0 a b
operation intoa
andb+1
- Maintain the list of flip points, and keep it ordered
- To perform a
1
operation, find the largest flip point beforeA
. If its index in flip list is even, theA
coin is tails up, otherwise it is heads up. Then, traverse the flip list untilB
adding up lengths of every other interval.
- Decouple the
The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.
I'd be interested to see how you came up withO(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst caseO(logQ)
â TemporalWolf
Jan 25 at 21:18
1
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
1
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
 |Â
show 4 more comments
up vote
11
down vote
Thou shall not brute force.
There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of
0
operations.The key observation is that a
0 a b
operation is in fact a pair of operations: flip coins froma
to the end, and flip coins fromb + 1
to the end. This leads to the solution:- Decouple the
0 a b
operation intoa
andb+1
- Maintain the list of flip points, and keep it ordered
- To perform a
1
operation, find the largest flip point beforeA
. If its index in flip list is even, theA
coin is tails up, otherwise it is heads up. Then, traverse the flip list untilB
adding up lengths of every other interval.
- Decouple the
The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.
I'd be interested to see how you came up withO(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst caseO(logQ)
â TemporalWolf
Jan 25 at 21:18
1
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
1
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
 |Â
show 4 more comments
up vote
11
down vote
up vote
11
down vote
Thou shall not brute force.
There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of
0
operations.The key observation is that a
0 a b
operation is in fact a pair of operations: flip coins froma
to the end, and flip coins fromb + 1
to the end. This leads to the solution:- Decouple the
0 a b
operation intoa
andb+1
- Maintain the list of flip points, and keep it ordered
- To perform a
1
operation, find the largest flip point beforeA
. If its index in flip list is even, theA
coin is tails up, otherwise it is heads up. Then, traverse the flip list untilB
adding up lengths of every other interval.
- Decouple the
The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.
Thou shall not brute force.
There is no reason to maintain the entire list of coin states, much less to perform actual flips. The state is easily recoverable from the sequence of
0
operations.The key observation is that a
0 a b
operation is in fact a pair of operations: flip coins froma
to the end, and flip coins fromb + 1
to the end. This leads to the solution:- Decouple the
0 a b
operation intoa
andb+1
- Maintain the list of flip points, and keep it ordered
- To perform a
1
operation, find the largest flip point beforeA
. If its index in flip list is even, theA
coin is tails up, otherwise it is heads up. Then, traverse the flip list untilB
adding up lengths of every other interval.
- Decouple the
The space complexity is $O(Q)$; the time complexity is $O(Qlog Q)$.
answered Jan 25 at 21:01
vnp
36.6k12991
36.6k12991
I'd be interested to see how you came up withO(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst caseO(logQ)
â TemporalWolf
Jan 25 at 21:18
1
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
1
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
 |Â
show 4 more comments
I'd be interested to see how you came up withO(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst caseO(logQ)
â TemporalWolf
Jan 25 at 21:18
1
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
1
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
I'd be interested to see how you came up with
O(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
â TemporalWolf
Jan 25 at 21:18
I'd be interested to see how you came up with
O(QlogQ)
as the time complexity, specifically how you can say the time complexity for any read operation is in the worst case O(logQ)
â TemporalWolf
Jan 25 at 21:18
1
1
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
The number of flips or reads are $O(Q)$. The vector of stored range-begin/range-end marks is likewise. Searching a sorted list for a value is going to be $O(logQ)$. Since you have to do a linear scan for the summation that adds $Q$. Since the number of reads is $O(Q)$, I think you end up with $O(Q * (Q + logQ))$ which is $O(Q²)$. But that's better than $O(Q * N)$ in this case.
â Austin Hastings
Jan 25 at 21:37
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
Hm, I tried implementing this (in pure Python so far) and I get a runtime of about 20s. Might ask a question with that code tomorrow, my implementation can probably be greatly improved.
â Graipher
Jan 25 at 22:15
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
@Graipher I suspect worst case scenarios can be produced which make either approach slower than the other, but I would expect in the average case this would win out. That being said, I haven't implemented it, so that's speculation.
â TemporalWolf
Jan 25 at 22:31
1
1
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
And how about writing some code?
â Ev. Kounis
Jan 26 at 9:04
 |Â
show 4 more comments
up vote
11
down vote
I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.
First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.
Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.
(iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.
There is an example implementation below. (Sorry, it's not pretty.)
Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.
Edit to add more detailed explanation:
The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.
This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).
The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.
For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)
First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.
For example, what should we do to flip the interval $[16,20) = I_2,4$?
First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)
Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)
Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)
Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums
and flips
. This term could be removed by using dictionaries instead, replacing the initialisation of sums
and flips
by
from collections import defaultdict
sums=[defaultdict(int) for i in range(n+1)]
flips=[defaultdict(bool) for i in range(n+1)]
but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.
from sys import stdin
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
n=1
while (1<<n)<N: n+=1
sums=[[0]*((N>>i)+2) for i in range(n+1)]
flips=[[False]*((N>>i)+1) for i in range(n+1)]
# We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
#
# These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
# and the parent of I_i,j is I_i+1,j>>1.
#
# The parent interval I_i,j is the disjoint union of its two children.
# For example if i=3, j=5, then I_3,5=[40,48) is the union of
# I_2,10=[40,44) and I_2,11=[44,48).
#
# The interval [0,R) decomposes into a disjoint union of standard intervals
# I_i,j according to the binary expansion of R:
# If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
# [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
#
# The collection of all ancestor nodes of all standard intervals in the decomposition
# of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
#
# We maintain the invariant:
# Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
# as (i',j') ranges over the ancestors of (i,j) in the tree.
def flip(R):# flip coins [0,R)
m=0;M=1
while M<=R:
if R&M:# flip coins [(R-M)&-M, R&-M)
t=(R-M)>>m
sums[m][t]=M-sums[m][t]
flips[m][t]=not flips[m][t]# invert flip status to make descendants work
m+=1;M<<=1
# Fix ancestors of changed nodes
m=1;M=2
while M<=N:
R2=R>>m
sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
m+=1;M<<=1
def csum(R):# Sum over [0,R)
s=0
fl=False
for m in range(n,-1,-1):
M=1<<m
if R&M:
t=sums[m][(R-M)>>m]
if fl: t=M-t
s+=t
fl=(fl!=flips[m][R>>m])# exclusive or
return s
out=
for i in range(Q):
op, A, B = getline()# operation, start, end (inclusive)
B+=1
if op==0: flip(A);flip(B)
else: out.append(str(csum(B)-csum(A)))
print("n".join(out))
1
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
2
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
2
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I tried running your code in Python 3.5, adding a few performance enhancements (savingstdin.readline
to a variable to avoid the lookup everytime, using list comprehensions forsums
andflips
, storing(R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
â Graipher
Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
 |Â
show 1 more comment
up vote
11
down vote
I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.
First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.
Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.
(iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.
There is an example implementation below. (Sorry, it's not pretty.)
Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.
Edit to add more detailed explanation:
The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.
This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).
The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.
For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)
First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.
For example, what should we do to flip the interval $[16,20) = I_2,4$?
First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)
Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)
Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)
Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums
and flips
. This term could be removed by using dictionaries instead, replacing the initialisation of sums
and flips
by
from collections import defaultdict
sums=[defaultdict(int) for i in range(n+1)]
flips=[defaultdict(bool) for i in range(n+1)]
but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.
from sys import stdin
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
n=1
while (1<<n)<N: n+=1
sums=[[0]*((N>>i)+2) for i in range(n+1)]
flips=[[False]*((N>>i)+1) for i in range(n+1)]
# We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
#
# These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
# and the parent of I_i,j is I_i+1,j>>1.
#
# The parent interval I_i,j is the disjoint union of its two children.
# For example if i=3, j=5, then I_3,5=[40,48) is the union of
# I_2,10=[40,44) and I_2,11=[44,48).
#
# The interval [0,R) decomposes into a disjoint union of standard intervals
# I_i,j according to the binary expansion of R:
# If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
# [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
#
# The collection of all ancestor nodes of all standard intervals in the decomposition
# of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
#
# We maintain the invariant:
# Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
# as (i',j') ranges over the ancestors of (i,j) in the tree.
def flip(R):# flip coins [0,R)
m=0;M=1
while M<=R:
if R&M:# flip coins [(R-M)&-M, R&-M)
t=(R-M)>>m
sums[m][t]=M-sums[m][t]
flips[m][t]=not flips[m][t]# invert flip status to make descendants work
m+=1;M<<=1
# Fix ancestors of changed nodes
m=1;M=2
while M<=N:
R2=R>>m
sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
m+=1;M<<=1
def csum(R):# Sum over [0,R)
s=0
fl=False
for m in range(n,-1,-1):
M=1<<m
if R&M:
t=sums[m][(R-M)>>m]
if fl: t=M-t
s+=t
fl=(fl!=flips[m][R>>m])# exclusive or
return s
out=
for i in range(Q):
op, A, B = getline()# operation, start, end (inclusive)
B+=1
if op==0: flip(A);flip(B)
else: out.append(str(csum(B)-csum(A)))
print("n".join(out))
1
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
2
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
2
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I tried running your code in Python 3.5, adding a few performance enhancements (savingstdin.readline
to a variable to avoid the lookup everytime, using list comprehensions forsums
andflips
, storing(R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
â Graipher
Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
 |Â
show 1 more comment
up vote
11
down vote
up vote
11
down vote
I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.
First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.
Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.
(iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.
There is an example implementation below. (Sorry, it's not pretty.)
Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.
Edit to add more detailed explanation:
The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.
This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).
The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.
For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)
First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.
For example, what should we do to flip the interval $[16,20) = I_2,4$?
First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)
Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)
Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)
Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums
and flips
. This term could be removed by using dictionaries instead, replacing the initialisation of sums
and flips
by
from collections import defaultdict
sums=[defaultdict(int) for i in range(n+1)]
flips=[defaultdict(bool) for i in range(n+1)]
but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.
from sys import stdin
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
n=1
while (1<<n)<N: n+=1
sums=[[0]*((N>>i)+2) for i in range(n+1)]
flips=[[False]*((N>>i)+1) for i in range(n+1)]
# We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
#
# These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
# and the parent of I_i,j is I_i+1,j>>1.
#
# The parent interval I_i,j is the disjoint union of its two children.
# For example if i=3, j=5, then I_3,5=[40,48) is the union of
# I_2,10=[40,44) and I_2,11=[44,48).
#
# The interval [0,R) decomposes into a disjoint union of standard intervals
# I_i,j according to the binary expansion of R:
# If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
# [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
#
# The collection of all ancestor nodes of all standard intervals in the decomposition
# of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
#
# We maintain the invariant:
# Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
# as (i',j') ranges over the ancestors of (i,j) in the tree.
def flip(R):# flip coins [0,R)
m=0;M=1
while M<=R:
if R&M:# flip coins [(R-M)&-M, R&-M)
t=(R-M)>>m
sums[m][t]=M-sums[m][t]
flips[m][t]=not flips[m][t]# invert flip status to make descendants work
m+=1;M<<=1
# Fix ancestors of changed nodes
m=1;M=2
while M<=N:
R2=R>>m
sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
m+=1;M<<=1
def csum(R):# Sum over [0,R)
s=0
fl=False
for m in range(n,-1,-1):
M=1<<m
if R&M:
t=sums[m][(R-M)>>m]
if fl: t=M-t
s+=t
fl=(fl!=flips[m][R>>m])# exclusive or
return s
out=
for i in range(Q):
op, A, B = getline()# operation, start, end (inclusive)
B+=1
if op==0: flip(A);flip(B)
else: out.append(str(csum(B)-csum(A)))
print("n".join(out))
I think the problem authors maybe intended for people to use a data structure that allows for fast queries and updates. You can make it run in $Qlog(N)$ time by using a tree structure.
First note that flipping coins in the interval $[l,r)$ is the same as flipping $[0,r)$ and $[0,l)$, so we are reduced to flipping $[0,r)$. The idea is not to actually flip anything, but to use a binary tree which records the number, $n_i,j$ of heads over intervals of the form $I_i,j=2^i[j,j+1)$. The children of $I_i,j$ are $I_i-1,2j$ and $I_i-1,2j+1$.
Each update caused by flipping $[0,r)$ can be broken down as a sequence of flips of intervals $I_i,p_i$, where $p_i=sum_j>i2^j-ir_j$ are derived from the binary expansion $sum_jge02^jr_j$ of $r$. To flip $[0,r)$ you have to (i) flip $I_i,p_i$ for each $r_i=1$, which means changing $n_i,p_i$ to $2^i-n_i,p_i$, (ii) fix up the ancestor nodes of the form $I_i+1,p_i/2$, (iii) make a note of a flip at the node $(i,p_i)$ in a separate array so that all descendant nodes can be reversed when they are queried.
(iii) saves the trouble of doing an order $N$ amount of work fixing up child nodes, at the cost of causing queries an extra $log_2(N)$ work checking all ancestors for flips. Actually this isn't really much extra work because the query already took $log_2(N)$ time.
There is an example implementation below. (Sorry, it's not pretty.)
Edit to add: I tried submitting this to Codechef and it passed in the pypy and Python 2 categories (with the fastest running time and the only valid entry respectively), but timed out in Python 3. Presumably it is only just fast enough as Python 2, and only just too slow as Python 3.
Edit to add more detailed explanation:
The invariant that is maintained is that the number of heads over the range $I_i,j=[j$<<$i,(j+1)$<<$i)$ is equal to sums$[i][j]$ or $2^i-$sums$[i][j]$ according to whether $sum_d>0 $flips$[i+d][j$>>$d]$ is even or odd respectively.
This sounds awkward, but it's just a binary tree, or rather, two identically-indexed binary trees: one corresponding to sums (integers) and one to flips (booleans).
The nodes are indexed by $(i,j)$ for $0le j<2^n-i$, where $n$ is equal to $log_2(N)$ rounded up to the nearest integer. The node at $(i,j)$ represents the interval $[2^i.j,2^i(j+1))$. For $i>0$, the children of $(i,j)$ are $(i-1,2j)$ and $(i-1,2j+1)$, being the two intervals that together form the parent interval, while the $(0,j)$ nodes are leaves.
For example, suppose $N=32$, $n=5$, and we wish to flip the interval $[0,23)$. ("Flipping this interval" is a shorthand for updating the trees sums and flips maintaining the invariant, as if the coins in positions $0,ldots,22$ had all been flipped.)
First write $23$ in binary as $23=16+4+2+1$, then separately flip the intervals $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.
For example, what should we do to flip the interval $[16,20) = I_2,4$?
First we need to change sums$[2][4]$ to $4-$sums$[2][4]$. That fixes queries of $I_2,4$ itself. (This corresponds to the loop in the code prefaced by the comment "flip coins [(R-M)&-M, R&-M)".)
Then we need to fix the count for the parent interval, $[16,24) = I_3,2$. This can be done by setting the number of heads in $[16,24)$ equal to our new number for $[16,20)$ plus that in $[20,24)$. Then the grandparent interval, $[16,32)$ needs to be adjusted by rewriting its head count to be that of $[16,24)$ plus that of $[24,32)$. (This corresponds to the loop in the code prefaced by the comment "Fix ancestors of changed nodes". Though actually these operations are amalgamated for all of $[0,16)$, $[16,20)$, $[20,22)$, $[22,23)$.)
Then we need to fix the count for queries of all descendant intervals: $[16,18)$, $[18,20)$, $[16,17)$, $[17,18)$, $[18,19)$, $[19,20)$. In general there could be quite a lot of these, and if we did something manually for all of them, it would ruin our complexity order. So instead we defer this operation to query time (i.e., op=1, where we seek the head count) and just note that we need it to behave as if all of these intervals had be flipped in their entirety. This is effected by inverting the boolean flips$[2][4]$, corresponding to the interval $[16,20)$. This deferred evaluation puts an obligation at query time to check the flip status of all parent nodes, inverting the headcount if the number of ancestor flip booleans is odd. (Variable fl in csum().)
Note: the code below is $O(N+Qlog(N))$ rather than $O(Qlog(N))$. The additional $N$ comes from allocating the arrays sums
and flips
. This term could be removed by using dictionaries instead, replacing the initialisation of sums
and flips
by
from collections import defaultdict
sums=[defaultdict(int) for i in range(n+1)]
flips=[defaultdict(bool) for i in range(n+1)]
but I've left it as arrays here because it runs faster for the typical values we are using here, roughly $N=Q=100000$.
from sys import stdin
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
n=1
while (1<<n)<N: n+=1
sums=[[0]*((N>>i)+2) for i in range(n+1)]
flips=[[False]*((N>>i)+1) for i in range(n+1)]
# We keep information about the dice over "standard intervals" I_i,j=[j<<i,(j+1)<<i).
#
# These form a binary tree where the children of I_i,j are I_i-1,2j and I_i-1,2j+1,
# and the parent of I_i,j is I_i+1,j>>1.
#
# The parent interval I_i,j is the disjoint union of its two children.
# For example if i=3, j=5, then I_3,5=[40,48) is the union of
# I_2,10=[40,44) and I_2,11=[44,48).
#
# The interval [0,R) decomposes into a disjoint union of standard intervals
# I_i,j according to the binary expansion of R:
# If R = r_n r_n-1 ... r_0 in binary (r_i=0 or 1), then
# [0,R) is the disjoint union of I_i,(R>>i)-1 over i such that r_i=1.
#
# The collection of all ancestor nodes of all standard intervals in the decomposition
# of [0,R) is I_i,R>>i over all i>0 (whether or not r_i = 0 or 1).
#
# We maintain the invariant:
# Number of heads over the range I_i,j = sums[i][j] modified by flips[i'][j']
# as (i',j') ranges over the ancestors of (i,j) in the tree.
def flip(R):# flip coins [0,R)
m=0;M=1
while M<=R:
if R&M:# flip coins [(R-M)&-M, R&-M)
t=(R-M)>>m
sums[m][t]=M-sums[m][t]
flips[m][t]=not flips[m][t]# invert flip status to make descendants work
m+=1;M<<=1
# Fix ancestors of changed nodes
m=1;M=2
while M<=N:
R2=R>>m
sums[m][R2]=sums[m-1][2*R2]+sums[m-1][2*R2+1]
if flips[m][R2]: sums[m][R2]=M-sums[m][R2]
m+=1;M<<=1
def csum(R):# Sum over [0,R)
s=0
fl=False
for m in range(n,-1,-1):
M=1<<m
if R&M:
t=sums[m][(R-M)>>m]
if fl: t=M-t
s+=t
fl=(fl!=flips[m][R>>m])# exclusive or
return s
out=
for i in range(Q):
op, A, B = getline()# operation, start, end (inclusive)
B+=1
if op==0: flip(A);flip(B)
else: out.append(str(csum(B)-csum(A)))
print("n".join(out))
edited Feb 1 at 16:21
answered Jan 26 at 9:54
Alex Selby
68127
68127
1
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
2
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
2
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I tried running your code in Python 3.5, adding a few performance enhancements (savingstdin.readline
to a variable to avoid the lookup everytime, using list comprehensions forsums
andflips
, storing(R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
â Graipher
Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
 |Â
show 1 more comment
1
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
2
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
2
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I tried running your code in Python 3.5, adding a few performance enhancements (savingstdin.readline
to a variable to avoid the lookup everytime, using list comprehensions forsums
andflips
, storing(R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...
â Graipher
Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
1
1
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
I was answering the question taken as a question about algorithms and performance. Perhaps the code is not suited to this site, hence the apology upfront. I think it does answer the question of how to solve this problem in principle with the correct complexity, and so addresses the original question which was primarily about performance.
â Alex Selby
Jan 26 at 10:43
2
2
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
It's clever code, but it's way too clever for someone like me!
â Gareth Rees
Jan 26 at 11:00
2
2
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I'll add more explanation.
â Alex Selby
Jan 26 at 11:05
I tried running your code in Python 3.5, adding a few performance enhancements (saving
stdin.readline
to a variable to avoid the lookup everytime, using list comprehensions for sums
and flips
, storing (R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...â Graipher
Jan 26 at 11:45
I tried running your code in Python 3.5, adding a few performance enhancements (saving
stdin.readline
to a variable to avoid the lookup everytime, using list comprehensions for sums
and flips
, storing (R - M) >> m
in a variable, because it was computed three times and delaying the output until the end of the loop, using a list), but it was still not enough...â Graipher
Jan 26 at 11:45
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
I've added a more detailed explanation with an example and links to the code.
â Alex Selby
Jan 26 at 12:01
 |Â
show 1 more comment
up vote
6
down vote
Since Codechef seems to support numpy
, I would propose a solution like this:
import numpy as np
n, q = map(int, input().split())
coins = np.zeros(n, dtype=bool)
for _ in range(q):
command, start, end = map(int, input().split())
end += 1 # end is inclusive
if command == 0:
coins[start:end] = ~coins[start:end]
elif command == 1:
print(coins[start:end].sum())
This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~
. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).
I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).
I also added some whitespace, according to Python's official style-guide, PEP8.
This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).
import random
n, q = 100000, 100000
data = "n".join(" ".format(random.choice([0, 1]),
*sorted([random.randrange(n),
random.randrange(n)]))
for _ in range(q))
with open("flipping_coins_input.dat", "w") as f:
f.write(" n".format(n, q))
f.write(data)
The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:
import numpy as np
n, q = map(int, input().split())
c = np.zeros(n, dtype=bool)
out =
for _ in range(q):
command, start, end = map(int, input().split())
if command == 0:
c[start:end + 1] = ~c[start:end + 1]
elif command == 1:
# print(c[start:end + 1].sum())
out.append(str(c[start:end + 1].sum()))
print("n".join(out))
This shaves off about 10% (so it takes 2.7s).
1
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to theinput()
â Ev. Kounis
Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
1
Then I would propose to skip the flipping entirely and just go with a%2
approach. Since you have everything set up, could you try it maybe? The code is here
â Ev. Kounis
Jan 25 at 16:43
1
Your solution still brute forces.
â vnp
Jan 25 at 21:02
1
@EricDuminil On mine it takes almost the full time (usingtime python3 -c "import numpy as np"
). Butcat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
â Graipher
Jan 26 at 11:12
 |Â
show 6 more comments
up vote
6
down vote
Since Codechef seems to support numpy
, I would propose a solution like this:
import numpy as np
n, q = map(int, input().split())
coins = np.zeros(n, dtype=bool)
for _ in range(q):
command, start, end = map(int, input().split())
end += 1 # end is inclusive
if command == 0:
coins[start:end] = ~coins[start:end]
elif command == 1:
print(coins[start:end].sum())
This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~
. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).
I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).
I also added some whitespace, according to Python's official style-guide, PEP8.
This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).
import random
n, q = 100000, 100000
data = "n".join(" ".format(random.choice([0, 1]),
*sorted([random.randrange(n),
random.randrange(n)]))
for _ in range(q))
with open("flipping_coins_input.dat", "w") as f:
f.write(" n".format(n, q))
f.write(data)
The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:
import numpy as np
n, q = map(int, input().split())
c = np.zeros(n, dtype=bool)
out =
for _ in range(q):
command, start, end = map(int, input().split())
if command == 0:
c[start:end + 1] = ~c[start:end + 1]
elif command == 1:
# print(c[start:end + 1].sum())
out.append(str(c[start:end + 1].sum()))
print("n".join(out))
This shaves off about 10% (so it takes 2.7s).
1
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to theinput()
â Ev. Kounis
Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
1
Then I would propose to skip the flipping entirely and just go with a%2
approach. Since you have everything set up, could you try it maybe? The code is here
â Ev. Kounis
Jan 25 at 16:43
1
Your solution still brute forces.
â vnp
Jan 25 at 21:02
1
@EricDuminil On mine it takes almost the full time (usingtime python3 -c "import numpy as np"
). Butcat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
â Graipher
Jan 26 at 11:12
 |Â
show 6 more comments
up vote
6
down vote
up vote
6
down vote
Since Codechef seems to support numpy
, I would propose a solution like this:
import numpy as np
n, q = map(int, input().split())
coins = np.zeros(n, dtype=bool)
for _ in range(q):
command, start, end = map(int, input().split())
end += 1 # end is inclusive
if command == 0:
coins[start:end] = ~coins[start:end]
elif command == 1:
print(coins[start:end].sum())
This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~
. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).
I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).
I also added some whitespace, according to Python's official style-guide, PEP8.
This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).
import random
n, q = 100000, 100000
data = "n".join(" ".format(random.choice([0, 1]),
*sorted([random.randrange(n),
random.randrange(n)]))
for _ in range(q))
with open("flipping_coins_input.dat", "w") as f:
f.write(" n".format(n, q))
f.write(data)
The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:
import numpy as np
n, q = map(int, input().split())
c = np.zeros(n, dtype=bool)
out =
for _ in range(q):
command, start, end = map(int, input().split())
if command == 0:
c[start:end + 1] = ~c[start:end + 1]
elif command == 1:
# print(c[start:end + 1].sum())
out.append(str(c[start:end + 1].sum()))
print("n".join(out))
This shaves off about 10% (so it takes 2.7s).
Since Codechef seems to support numpy
, I would propose a solution like this:
import numpy as np
n, q = map(int, input().split())
coins = np.zeros(n, dtype=bool)
for _ in range(q):
command, start, end = map(int, input().split())
end += 1 # end is inclusive
if command == 0:
coins[start:end] = ~coins[start:end]
elif command == 1:
print(coins[start:end].sum())
This uses the trick mentioned here to flip the (sub-)array, using the unary negation operator ~
. It also makes sure that the second comparison is only run when the first one fails (since only one of the two commands can be run).
I also made the names a bit clearer, there is no need to be terse here (50k bytes is the source code limit, and this is less than 500).
I also added some whitespace, according to Python's official style-guide, PEP8.
This also exceeds the time-limit, though. On my machine this takes about 3.1 s with an input file generated with the code below, which uses the upper-bounds for $N$ and $Q$. For comparison, the OP's code takes almost 10 minutes (580s), so this is already a significant speed-up, but there is still a factor 10 missing until the highest time-limit (0.3s).
import random
n, q = 100000, 100000
data = "n".join(" ".format(random.choice([0, 1]),
*sorted([random.randrange(n),
random.randrange(n)]))
for _ in range(q))
with open("flipping_coins_input.dat", "w") as f:
f.write(" n".format(n, q))
f.write(data)
The code above can be slightly sped-up by delaying the output to after the loop, which reduces the number of flushes:
import numpy as np
n, q = map(int, input().split())
c = np.zeros(n, dtype=bool)
out =
for _ in range(q):
command, start, end = map(int, input().split())
if command == 0:
c[start:end + 1] = ~c[start:end + 1]
elif command == 1:
# print(c[start:end + 1].sum())
out.append(str(c[start:end + 1].sum()))
print("n".join(out))
This shaves off about 10% (so it takes 2.7s).
edited Jan 26 at 11:25
answered Jan 25 at 16:19
Graipher
20.5k43081
20.5k43081
1
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to theinput()
â Ev. Kounis
Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
1
Then I would propose to skip the flipping entirely and just go with a%2
approach. Since you have everything set up, could you try it maybe? The code is here
â Ev. Kounis
Jan 25 at 16:43
1
Your solution still brute forces.
â vnp
Jan 25 at 21:02
1
@EricDuminil On mine it takes almost the full time (usingtime python3 -c "import numpy as np"
). Butcat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
â Graipher
Jan 26 at 11:12
 |Â
show 6 more comments
1
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to theinput()
â Ev. Kounis
Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
1
Then I would propose to skip the flipping entirely and just go with a%2
approach. Since you have everything set up, could you try it maybe? The code is here
â Ev. Kounis
Jan 25 at 16:43
1
Your solution still brute forces.
â vnp
Jan 25 at 21:02
1
@EricDuminil On mine it takes almost the full time (usingtime python3 -c "import numpy as np"
). Butcat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).
â Graipher
Jan 26 at 11:12
1
1
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the
input()
â Ev. Kounis
Jan 25 at 16:39
Looks pretty streamlined to me. Can't see how this fails the time limit.. I could be that the input has to be read in a different way and the time limit is reached because nothing is fed to the
input()
â Ev. Kounis
Jan 25 at 16:39
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
@Ev.Kounis Well, it does still take about 3s on my machine for a pessimistic input and the time limit is 0.15 - 0.3 s (I guess language-specific, and Python will be towards the high-end of that, hopefully).
â Graipher
Jan 25 at 16:42
1
1
Then I would propose to skip the flipping entirely and just go with a
%2
approach. Since you have everything set up, could you try it maybe? The code is hereâ Ev. Kounis
Jan 25 at 16:43
Then I would propose to skip the flipping entirely and just go with a
%2
approach. Since you have everything set up, could you try it maybe? The code is hereâ Ev. Kounis
Jan 25 at 16:43
1
1
Your solution still brute forces.
â vnp
Jan 25 at 21:02
Your solution still brute forces.
â vnp
Jan 25 at 21:02
1
1
@EricDuminil On mine it takes almost the full time (using
time python3 -c "import numpy as np"
). But cat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).â Graipher
Jan 26 at 11:12
@EricDuminil On mine it takes almost the full time (using
time python3 -c "import numpy as np"
). But cat flipping_coins.dat
(so just piping the file content for the worst case) takes also already half the time. So I'm not sure how there can be any successful submissions under 0.1s (the fastest C++ was 0.01s).â Graipher
Jan 26 at 11:12
 |Â
show 6 more comments
up vote
6
down vote
It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).
I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.
This is a table of timings
- Program 1 = Graipher's numpy-based method
- Program 2 = $O(Qlog(N))$ tree-based method
- Program 3 = numpy-based method with blocks (below)
Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.
Python 2 Python 3 pypy
Program 1 2.48 2.65 n/a
Program 2 1.58 2.16 0.29
Program 3 1.70 1.85 n/a
from sys import stdin
import numpy as np
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
# c0 = bitmap of heads, divided into blocks of size 64
N1=N//64+1
c0 = np.zeros(N1, dtype=np.int64)
# c1 = number of set bits for each block
c1 = np.full(N1, 0, dtype=int)
# Number of set bits in 16 bit range
btc16=[0]
for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))
# Number of set bits in 64 bit range
def btc64(x):
return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]
out =
for _ in range(Q):
command, start, end = getline()
end+=1
# Decompose start into block number, s1, and bit number within block, s0
s1=start>>6; s0=start&63
# Same for end
e1=end>>6; e0=end&63
if command == 0:
if s1<e1:
c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
else:
c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
elif command == 1:
if s1<e1:
t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
else:
t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
out.append(str(t))
print("n".join(out))
add a comment |Â
up vote
6
down vote
It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).
I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.
This is a table of timings
- Program 1 = Graipher's numpy-based method
- Program 2 = $O(Qlog(N))$ tree-based method
- Program 3 = numpy-based method with blocks (below)
Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.
Python 2 Python 3 pypy
Program 1 2.48 2.65 n/a
Program 2 1.58 2.16 0.29
Program 3 1.70 1.85 n/a
from sys import stdin
import numpy as np
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
# c0 = bitmap of heads, divided into blocks of size 64
N1=N//64+1
c0 = np.zeros(N1, dtype=np.int64)
# c1 = number of set bits for each block
c1 = np.full(N1, 0, dtype=int)
# Number of set bits in 16 bit range
btc16=[0]
for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))
# Number of set bits in 64 bit range
def btc64(x):
return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]
out =
for _ in range(Q):
command, start, end = getline()
end+=1
# Decompose start into block number, s1, and bit number within block, s0
s1=start>>6; s0=start&63
# Same for end
e1=end>>6; e0=end&63
if command == 0:
if s1<e1:
c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
else:
c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
elif command == 1:
if s1<e1:
t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
else:
t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
out.append(str(t))
print("n".join(out))
add a comment |Â
up vote
6
down vote
up vote
6
down vote
It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).
I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.
This is a table of timings
- Program 1 = Graipher's numpy-based method
- Program 2 = $O(Qlog(N))$ tree-based method
- Program 3 = numpy-based method with blocks (below)
Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.
Python 2 Python 3 pypy
Program 1 2.48 2.65 n/a
Program 2 1.58 2.16 0.29
Program 3 1.70 1.85 n/a
from sys import stdin
import numpy as np
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
# c0 = bitmap of heads, divided into blocks of size 64
N1=N//64+1
c0 = np.zeros(N1, dtype=np.int64)
# c1 = number of set bits for each block
c1 = np.full(N1, 0, dtype=int)
# Number of set bits in 16 bit range
btc16=[0]
for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))
# Number of set bits in 64 bit range
def btc64(x):
return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]
out =
for _ in range(Q):
command, start, end = getline()
end+=1
# Decompose start into block number, s1, and bit number within block, s0
s1=start>>6; s0=start&63
# Same for end
e1=end>>6; e0=end&63
if command == 0:
if s1<e1:
c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
else:
c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
elif command == 1:
if s1<e1:
t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
else:
t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
out.append(str(t))
print("n".join(out))
It looks like certain numpy operations give a very large constant factor speed benefit, as Graipher demonstrated in his answer. Graipher's solution is $O(QN)$ but is competitive with the $O(Qlog(N))$ tree-based pure Python solution (detailed in my other answer).
I wondered if it was possible to adapt Graipher's solution so that it passed under the Codechef time threshold using Python 3.5. The attempt below divides the bitmap up into chunks of 64, caching the totals of each chunk. Under Python 3 it's slightly faster than the $O(Qlog(N))$ solution and does pass the Codechef time threshold. They accept a solution that takes 0.84s (by their timing), despite what they say elsewhere about 0.3s being the threshold. Perhaps Python solutions are allowed to run slower.
This is a table of timings
- Program 1 = Graipher's numpy-based method
- Program 2 = $O(Qlog(N))$ tree-based method
- Program 3 = numpy-based method with blocks (below)
Timings are in seconds using my computer (an ordinary desktop) on a random testfile with $N=Q=100000$.
Python 2 Python 3 pypy
Program 1 2.48 2.65 n/a
Program 2 1.58 2.16 0.29
Program 3 1.70 1.85 n/a
from sys import stdin
import numpy as np
def getline(): return map(int,stdin.readline().split())
N, Q = getline()
# c0 = bitmap of heads, divided into blocks of size 64
N1=N//64+1
c0 = np.zeros(N1, dtype=np.int64)
# c1 = number of set bits for each block
c1 = np.full(N1, 0, dtype=int)
# Number of set bits in 16 bit range
btc16=[0]
for i in range(1,65536): btc16.append(btc16[i>>1]+(i&1))
# Number of set bits in 64 bit range
def btc64(x):
return btc16[x&0xffff]+btc16[x>>16&0xffff]+btc16[x>>32&0xffff]+btc16[x>>48&0xffff]
out =
for _ in range(Q):
command, start, end = getline()
end+=1
# Decompose start into block number, s1, and bit number within block, s0
s1=start>>6; s0=start&63
# Same for end
e1=end>>6; e0=end&63
if command == 0:
if s1<e1:
c0[s1+1:e1]=~c0[s1+1:e1]; c1[s1+1:e1]=64-c1[s1+1:e1]
c0[s1]^=-(1<<s0); c1[s1]=btc64(c0[s1])
c0[e1]^=(1<<e0)-1; c1[e1]=btc64(c0[e1])
else:
c0[s1]^=(1<<e0)-(1<<s0); c1[s1]=btc64(c0[s1])
elif command == 1:
if s1<e1:
t=btc64(c0[s1]&(-(1<<s0)))+c1[s1+1:e1].sum()+btc64(c0[e1]&((1<<e0)-1))
else:
t=btc64(c0[s1]&((1<<e0)-(1<<s0)))
out.append(str(t))
print("n".join(out))
edited Jan 28 at 4:09
answered Jan 27 at 18:51
Alex Selby
68127
68127
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185980%2fflipping-coins-performance%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
3
It seems that some people succeeded using
pypy
â oliverpool
Jan 25 at 16:39
11
#do Something
is worse than no comment at allâ Bailey Parker
Jan 26 at 0:07
1
You don't need to
list
before you unpack. Unpacking works with generators.â Bailey Parker
Jan 26 at 0:08
1
The
if
statement to flip the coins can be simplified toc[m] = 1 - c[m]
â Barmar
Jan 26 at 8:22
1
This screams for interval trees.
â RobAu
Jan 26 at 10:28