For-Loop with numpy array too slow

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
8
down vote

favorite












I'm trying to speed up the execution of the following problem:



I have two functions which are used together.



Here are the functions explained:



The first function simulates N coin flips and returns an array of strings with the results:



def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


The second function takes in number_of_samples, sample_size
and returns sample_probs.
The function simulates number_of_samples runs of sample_size coin flips and stores the result of num_heads / sample size in sample_probs for each run.



Here is the code:



def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs


Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1 I would do 200 runs of 1 coin flip, for sample_size = 10 I would do 200 runs of 10 coin flips and so on.



Here is the code that takes so long:



mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin or make_throws to an array.







share|improve this question















  • 4




    To the 2 people who voted to close this question: what exactly is wrong with it?
    – Mast
    Apr 15 at 11:42






  • 1




    @πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.
    – Simon Forsberg♦
    Apr 15 at 13:02
















up vote
8
down vote

favorite












I'm trying to speed up the execution of the following problem:



I have two functions which are used together.



Here are the functions explained:



The first function simulates N coin flips and returns an array of strings with the results:



def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


The second function takes in number_of_samples, sample_size
and returns sample_probs.
The function simulates number_of_samples runs of sample_size coin flips and stores the result of num_heads / sample size in sample_probs for each run.



Here is the code:



def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs


Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1 I would do 200 runs of 1 coin flip, for sample_size = 10 I would do 200 runs of 10 coin flips and so on.



Here is the code that takes so long:



mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin or make_throws to an array.







share|improve this question















  • 4




    To the 2 people who voted to close this question: what exactly is wrong with it?
    – Mast
    Apr 15 at 11:42






  • 1




    @πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.
    – Simon Forsberg♦
    Apr 15 at 13:02












up vote
8
down vote

favorite









up vote
8
down vote

favorite











I'm trying to speed up the execution of the following problem:



I have two functions which are used together.



Here are the functions explained:



The first function simulates N coin flips and returns an array of strings with the results:



def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


The second function takes in number_of_samples, sample_size
and returns sample_probs.
The function simulates number_of_samples runs of sample_size coin flips and stores the result of num_heads / sample size in sample_probs for each run.



Here is the code:



def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs


Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1 I would do 200 runs of 1 coin flip, for sample_size = 10 I would do 200 runs of 10 coin flips and so on.



Here is the code that takes so long:



mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin or make_throws to an array.







share|improve this question











I'm trying to speed up the execution of the following problem:



I have two functions which are used together.



Here are the functions explained:



The first function simulates N coin flips and returns an array of strings with the results:



def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


The second function takes in number_of_samples, sample_size
and returns sample_probs.
The function simulates number_of_samples runs of sample_size coin flips and stores the result of num_heads / sample size in sample_probs for each run.



Here is the code:



def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs


Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1 I would do 200 runs of 1 coin flip, for sample_size = 10 I would do 200 runs of 10 coin flips and so on.



Here is the code that takes so long:



mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin or make_throws to an array.









share|improve this question










share|improve this question




share|improve this question









asked Apr 15 at 10:43









David P

483




483







  • 4




    To the 2 people who voted to close this question: what exactly is wrong with it?
    – Mast
    Apr 15 at 11:42






  • 1




    @πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.
    – Simon Forsberg♦
    Apr 15 at 13:02












  • 4




    To the 2 people who voted to close this question: what exactly is wrong with it?
    – Mast
    Apr 15 at 11:42






  • 1




    @πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.
    – Simon Forsberg♦
    Apr 15 at 13:02







4




4




To the 2 people who voted to close this question: what exactly is wrong with it?
– Mast
Apr 15 at 11:42




To the 2 people who voted to close this question: what exactly is wrong with it?
– Mast
Apr 15 at 11:42




1




1




@πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.
– Simon Forsberg♦
Apr 15 at 13:02




@πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.
– Simon Forsberg♦
Apr 15 at 13:02










1 Answer
1






active

oldest

votes

















up vote
4
down vote



accepted










My toy example I build from your snippets:



import numpy as np


def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)


There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)


Here is the code:



import numpy as np


def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.






share|improve this answer





















  • Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
    – David P
    Apr 15 at 17:32










  • @DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
    – FirefoxMetzger
    Apr 15 at 17:52










  • I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
    – David P
    Apr 16 at 16:42










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192108%2ffor-loop-with-numpy-array-too-slow%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
4
down vote



accepted










My toy example I build from your snippets:



import numpy as np


def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)


There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)


Here is the code:



import numpy as np


def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.






share|improve this answer





















  • Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
    – David P
    Apr 15 at 17:32










  • @DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
    – FirefoxMetzger
    Apr 15 at 17:52










  • I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
    – David P
    Apr 16 at 16:42














up vote
4
down vote



accepted










My toy example I build from your snippets:



import numpy as np


def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)


There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)


Here is the code:



import numpy as np


def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.






share|improve this answer





















  • Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
    – David P
    Apr 15 at 17:32










  • @DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
    – FirefoxMetzger
    Apr 15 at 17:52










  • I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
    – David P
    Apr 16 at 16:42












up vote
4
down vote



accepted







up vote
4
down vote



accepted






My toy example I build from your snippets:



import numpy as np


def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)


There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)


Here is the code:



import numpy as np


def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.






share|improve this answer













My toy example I build from your snippets:



import numpy as np


def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)


def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 built-in method builtins.exec
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 built-in method builtins.sum
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 method 'choice' of 'mtrand.RandomState' objects
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)


There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 built-in method builtins.exec
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 method 'choice' of 'mtrand.RandomState' objects
201500 0.702 0.000 0.702 0.000 method 'reduce' of 'numpy.ufunc' objects
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 built-in method builtins.exec
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 method 'uniform' of 'mtrand.RandomState' objects
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 method 'reduce' of 'numpy.ufunc' objects
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)


next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):



ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 built-in method builtins.exec
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 method 'uniform' of 'mtrand.RandomState' objects
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)


Here is the code:



import numpy as np


def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs

sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))

for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)


At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.







share|improve this answer













share|improve this answer



share|improve this answer











answered Apr 15 at 14:19









FirefoxMetzger

74628




74628











  • Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
    – David P
    Apr 15 at 17:32










  • @DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
    – FirefoxMetzger
    Apr 15 at 17:52










  • I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
    – David P
    Apr 16 at 16:42
















  • Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
    – David P
    Apr 15 at 17:32










  • @DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
    – FirefoxMetzger
    Apr 15 at 17:52










  • I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
    – David P
    Apr 16 at 16:42















Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
– David P
Apr 15 at 17:32




Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
– David P
Apr 15 at 17:32












@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
– FirefoxMetzger
Apr 15 at 17:52




@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
– FirefoxMetzger
Apr 15 at 17:52












I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
– David P
Apr 16 at 16:42




I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!
– David P
Apr 16 at 16:42












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192108%2ffor-loop-with-numpy-array-too-slow%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation