For-Loop with numpy array too slow
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
8
down vote
favorite
I'm trying to speed up the execution of the following problem:
I have two functions which are used together.
Here are the functions explained:
The first function simulates N
coin flips and returns an array of strings with the results:
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
The second function takes in number_of_samples, sample_size
and returns sample_probs
.
The function simulates number_of_samples
runs of sample_size
coin flips and stores the result of num_heads / sample size
in sample_probs
for each run.
Here is the code:
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1
I would do 200 runs of 1 coin flip, for sample_size = 10
I would do 200 runs of 10 coin flips and so on.
Here is the code that takes so long:
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin
or make_throws
to an array.
python array numpy
add a comment |Â
up vote
8
down vote
favorite
I'm trying to speed up the execution of the following problem:
I have two functions which are used together.
Here are the functions explained:
The first function simulates N
coin flips and returns an array of strings with the results:
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
The second function takes in number_of_samples, sample_size
and returns sample_probs
.
The function simulates number_of_samples
runs of sample_size
coin flips and stores the result of num_heads / sample size
in sample_probs
for each run.
Here is the code:
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1
I would do 200 runs of 1 coin flip, for sample_size = 10
I would do 200 runs of 10 coin flips and so on.
Here is the code that takes so long:
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin
or make_throws
to an array.
python array numpy
4
To the 2 people who voted to close this question: what exactly is wrong with it?
â Mast
Apr 15 at 11:42
1
@ÃÂìýÃÂÃ񠨝õῠThe code looks very real to me. All required methods are in place so this is fully reviewable.
â Simon Forsbergâ¦
Apr 15 at 13:02
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I'm trying to speed up the execution of the following problem:
I have two functions which are used together.
Here are the functions explained:
The first function simulates N
coin flips and returns an array of strings with the results:
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
The second function takes in number_of_samples, sample_size
and returns sample_probs
.
The function simulates number_of_samples
runs of sample_size
coin flips and stores the result of num_heads / sample size
in sample_probs
for each run.
Here is the code:
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1
I would do 200 runs of 1 coin flip, for sample_size = 10
I would do 200 runs of 10 coin flips and so on.
Here is the code that takes so long:
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin
or make_throws
to an array.
python array numpy
I'm trying to speed up the execution of the following problem:
I have two functions which are used together.
Here are the functions explained:
The first function simulates N
coin flips and returns an array of strings with the results:
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
The second function takes in number_of_samples, sample_size
and returns sample_probs
.
The function simulates number_of_samples
runs of sample_size
coin flips and stores the result of num_heads / sample size
in sample_probs
for each run.
Here is the code:
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1
I would do 200 runs of 1 coin flip, for sample_size = 10
I would do 200 runs of 10 coin flips and so on.
Here is the code that takes so long:
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin
or make_throws
to an array.
python array numpy
asked Apr 15 at 10:43
David P
483
483
4
To the 2 people who voted to close this question: what exactly is wrong with it?
â Mast
Apr 15 at 11:42
1
@ÃÂìýÃÂÃ񠨝õῠThe code looks very real to me. All required methods are in place so this is fully reviewable.
â Simon Forsbergâ¦
Apr 15 at 13:02
add a comment |Â
4
To the 2 people who voted to close this question: what exactly is wrong with it?
â Mast
Apr 15 at 11:42
1
@ÃÂìýÃÂÃ񠨝õῠThe code looks very real to me. All required methods are in place so this is fully reviewable.
â Simon Forsbergâ¦
Apr 15 at 13:02
4
4
To the 2 people who voted to close this question: what exactly is wrong with it?
â Mast
Apr 15 at 11:42
To the 2 people who voted to close this question: what exactly is wrong with it?
â Mast
Apr 15 at 11:42
1
1
@ÃÂìýÃÂÃ񠨝õῠThe code looks very real to me. All required methods are in place so this is fully reviewable.
â Simon Forsbergâ¦
Apr 15 at 13:02
@ÃÂìýÃÂÃ񠨝õῠThe code looks very real to me. All required methods are in place so this is fully reviewable.
â Simon Forsbergâ¦
Apr 15 at 13:02
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
My toy example I build from your snippets:
import numpy as np
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)
There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum
instead (pushing it down to 3.457 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
further we can replace the strings "H"
and "T"
with a Boolean and stay in numpy for longer (down to 1.633 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
next we can get rid of throw_a_coin
and instead sample a numer_of_samples x sample_size
array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)
Here is the code:
import numpy as np
def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
At this point we could start to tackle the for
loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1)
and np.std(prob, axis=1)
outside the loop, but that only nets like 20ms
so it's more of a personal preference thing.
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
My toy example I build from your snippets:
import numpy as np
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)
There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum
instead (pushing it down to 3.457 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
further we can replace the strings "H"
and "T"
with a Boolean and stay in numpy for longer (down to 1.633 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
next we can get rid of throw_a_coin
and instead sample a numer_of_samples x sample_size
array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)
Here is the code:
import numpy as np
def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
At this point we could start to tackle the for
loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1)
and np.std(prob, axis=1)
outside the loop, but that only nets like 20ms
so it's more of a personal preference thing.
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
add a comment |Â
up vote
4
down vote
accepted
My toy example I build from your snippets:
import numpy as np
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)
There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum
instead (pushing it down to 3.457 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
further we can replace the strings "H"
and "T"
with a Boolean and stay in numpy for longer (down to 1.633 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
next we can get rid of throw_a_coin
and instead sample a numer_of_samples x sample_size
array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)
Here is the code:
import numpy as np
def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
At this point we could start to tackle the for
loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1)
and np.std(prob, axis=1)
outside the loop, but that only nets like 20ms
so it's more of a personal preference thing.
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
My toy example I build from your snippets:
import numpy as np
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)
There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum
instead (pushing it down to 3.457 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
further we can replace the strings "H"
and "T"
with a Boolean and stay in numpy for longer (down to 1.633 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
next we can get rid of throw_a_coin
and instead sample a numer_of_samples x sample_size
array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)
Here is the code:
import numpy as np
def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
At this point we could start to tackle the for
loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1)
and np.std(prob, axis=1)
outside the loop, but that only nets like 20ms
so it's more of a personal preference thing.
My toy example I build from your snippets:
import numpy as np
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)
There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum
instead (pushing it down to 3.457 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
further we can replace the strings "H"
and "T"
with a Boolean and stay in numpy for longer (down to 1.633 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
next we can get rid of throw_a_coin
and instead sample a numer_of_samples x sample_size
array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)
Here is the code:
import numpy as np
def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
At this point we could start to tackle the for
loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1)
and np.std(prob, axis=1)
outside the loop, but that only nets like 20ms
so it's more of a personal preference thing.
answered Apr 15 at 14:19
FirefoxMetzger
74628
74628
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
add a comment |Â
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
â David P
Apr 15 at 17:32
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
â FirefoxMetzger
Apr 15 at 17:52
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
â David P
Apr 16 at 16:42
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192108%2ffor-loop-with-numpy-array-too-slow%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
4
To the 2 people who voted to close this question: what exactly is wrong with it?
â Mast
Apr 15 at 11:42
1
@ÃÂìýÃÂÃ񠨝õῠThe code looks very real to me. All required methods are in place so this is fully reviewable.
â Simon Forsbergâ¦
Apr 15 at 13:02