Calculating the Average Minimum Length of Random Numbers From 0 - 1 that Adds up to Greater Than 1
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
8
down vote
favorite
My Computer Science teacher gave us a task to calculate the average after one billion trials.
His exact assignment was this:
Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)
Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.
Examples:
- 0.449 0.814
- length of sequence: 2
- 0.167 0.138 0.028 0.934
- length of sequence: 4
- 0.640 0.258 0.417
- length of sequence: 3
- 0.911 0.212
- length of sequence: 2
Average of the four lengths is 11/4 â 2.75
What is the average of one billion random sequences?
My code was this:
import random
def genSequence():
storenums = 0
numTrials = 1000000000
for x in range(0,numTrials):
numberOfAttempts = 0
getToOne = 0
while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
getToOne += random.random()
numberOfAttempts += 1
storenums = storenums + numberOfAttempts
#print (x)
#print(storenums)
calculateAverage(storenums,numTrials)
def calculateAverage(num,den):
average = num/den
print(average)
genSequence()
*Note: I am using repl.it to run my code so there is no main.
The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.
EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.
python python-3.x time-limit-exceeded memory-optimization
add a comment |Â
up vote
8
down vote
favorite
My Computer Science teacher gave us a task to calculate the average after one billion trials.
His exact assignment was this:
Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)
Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.
Examples:
- 0.449 0.814
- length of sequence: 2
- 0.167 0.138 0.028 0.934
- length of sequence: 4
- 0.640 0.258 0.417
- length of sequence: 3
- 0.911 0.212
- length of sequence: 2
Average of the four lengths is 11/4 â 2.75
What is the average of one billion random sequences?
My code was this:
import random
def genSequence():
storenums = 0
numTrials = 1000000000
for x in range(0,numTrials):
numberOfAttempts = 0
getToOne = 0
while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
getToOne += random.random()
numberOfAttempts += 1
storenums = storenums + numberOfAttempts
#print (x)
#print(storenums)
calculateAverage(storenums,numTrials)
def calculateAverage(num,den):
average = num/den
print(average)
genSequence()
*Note: I am using repl.it to run my code so there is no main.
The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.
EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.
python python-3.x time-limit-exceeded memory-optimization
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
My Computer Science teacher gave us a task to calculate the average after one billion trials.
His exact assignment was this:
Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)
Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.
Examples:
- 0.449 0.814
- length of sequence: 2
- 0.167 0.138 0.028 0.934
- length of sequence: 4
- 0.640 0.258 0.417
- length of sequence: 3
- 0.911 0.212
- length of sequence: 2
Average of the four lengths is 11/4 â 2.75
What is the average of one billion random sequences?
My code was this:
import random
def genSequence():
storenums = 0
numTrials = 1000000000
for x in range(0,numTrials):
numberOfAttempts = 0
getToOne = 0
while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
getToOne += random.random()
numberOfAttempts += 1
storenums = storenums + numberOfAttempts
#print (x)
#print(storenums)
calculateAverage(storenums,numTrials)
def calculateAverage(num,den):
average = num/den
print(average)
genSequence()
*Note: I am using repl.it to run my code so there is no main.
The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.
EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.
python python-3.x time-limit-exceeded memory-optimization
My Computer Science teacher gave us a task to calculate the average after one billion trials.
His exact assignment was this:
Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)
Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.
Examples:
- 0.449 0.814
- length of sequence: 2
- 0.167 0.138 0.028 0.934
- length of sequence: 4
- 0.640 0.258 0.417
- length of sequence: 3
- 0.911 0.212
- length of sequence: 2
Average of the four lengths is 11/4 â 2.75
What is the average of one billion random sequences?
My code was this:
import random
def genSequence():
storenums = 0
numTrials = 1000000000
for x in range(0,numTrials):
numberOfAttempts = 0
getToOne = 0
while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
getToOne += random.random()
numberOfAttempts += 1
storenums = storenums + numberOfAttempts
#print (x)
#print(storenums)
calculateAverage(storenums,numTrials)
def calculateAverage(num,den):
average = num/den
print(average)
genSequence()
*Note: I am using repl.it to run my code so there is no main.
The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.
EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.
python python-3.x time-limit-exceeded memory-optimization
asked Feb 8 at 14:38
PGODULTIMATE
985
985
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
9
down vote
accepted
If I rewrite genSequence
so that it takes numTrials
as an argument, then I get the following timing in CPython:
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715
Based on this, it would take about 10 minutes to compute genSequence(10**9)
. Possibly you just didn't wait long enough.
This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243
On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).
Some review points:
The number
1000000000
is hard to read â it could easily be confused with100000000
or10000000000
. I would write10**9
to make it clear.There's no need for the variable
numberOfAttempts
; you could just add one tostorenums
on each loop.The name
storenums
is a bit vague. This is the total length of the random sequences generated so far, so a name liketotal_length
would be clearer.Similarly, the name
genSequence
is vague. This calculates the mean length of a random sequence, so a name likemean_sequence_length
would be clearer.The meaning of the constant 1 is not altogether clear. I would give it a name like
target_sum
.When a loop variable like
x
is not used, it's conventional to name it_
.range(0,numTrials)
can be writtenrange(numTrials)
.
Revised code:
import random
def mean_sequence_length(trials, target_sum=1.0):
"""Return mean length of random sequences adding up to at least
target_sum (carrying out the given number of trials).
"""
total_length = 0
for _ in range(trials):
current_sum = 0.0
while current_sum < target_sum:
current_sum += random.random()
total_length += 1
return total_length / trials
2
Additionaly, instead of writing10**9
, one could write1_000_000_000
.
â Mathias Ettinger
Feb 8 at 16:13
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values oftrials
and extrapolate to work out how long it is going to take.
â Gareth Rees
Feb 8 at 23:59
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
accepted
If I rewrite genSequence
so that it takes numTrials
as an argument, then I get the following timing in CPython:
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715
Based on this, it would take about 10 minutes to compute genSequence(10**9)
. Possibly you just didn't wait long enough.
This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243
On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).
Some review points:
The number
1000000000
is hard to read â it could easily be confused with100000000
or10000000000
. I would write10**9
to make it clear.There's no need for the variable
numberOfAttempts
; you could just add one tostorenums
on each loop.The name
storenums
is a bit vague. This is the total length of the random sequences generated so far, so a name liketotal_length
would be clearer.Similarly, the name
genSequence
is vague. This calculates the mean length of a random sequence, so a name likemean_sequence_length
would be clearer.The meaning of the constant 1 is not altogether clear. I would give it a name like
target_sum
.When a loop variable like
x
is not used, it's conventional to name it_
.range(0,numTrials)
can be writtenrange(numTrials)
.
Revised code:
import random
def mean_sequence_length(trials, target_sum=1.0):
"""Return mean length of random sequences adding up to at least
target_sum (carrying out the given number of trials).
"""
total_length = 0
for _ in range(trials):
current_sum = 0.0
while current_sum < target_sum:
current_sum += random.random()
total_length += 1
return total_length / trials
2
Additionaly, instead of writing10**9
, one could write1_000_000_000
.
â Mathias Ettinger
Feb 8 at 16:13
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values oftrials
and extrapolate to work out how long it is going to take.
â Gareth Rees
Feb 8 at 23:59
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
add a comment |Â
up vote
9
down vote
accepted
If I rewrite genSequence
so that it takes numTrials
as an argument, then I get the following timing in CPython:
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715
Based on this, it would take about 10 minutes to compute genSequence(10**9)
. Possibly you just didn't wait long enough.
This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243
On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).
Some review points:
The number
1000000000
is hard to read â it could easily be confused with100000000
or10000000000
. I would write10**9
to make it clear.There's no need for the variable
numberOfAttempts
; you could just add one tostorenums
on each loop.The name
storenums
is a bit vague. This is the total length of the random sequences generated so far, so a name liketotal_length
would be clearer.Similarly, the name
genSequence
is vague. This calculates the mean length of a random sequence, so a name likemean_sequence_length
would be clearer.The meaning of the constant 1 is not altogether clear. I would give it a name like
target_sum
.When a loop variable like
x
is not used, it's conventional to name it_
.range(0,numTrials)
can be writtenrange(numTrials)
.
Revised code:
import random
def mean_sequence_length(trials, target_sum=1.0):
"""Return mean length of random sequences adding up to at least
target_sum (carrying out the given number of trials).
"""
total_length = 0
for _ in range(trials):
current_sum = 0.0
while current_sum < target_sum:
current_sum += random.random()
total_length += 1
return total_length / trials
2
Additionaly, instead of writing10**9
, one could write1_000_000_000
.
â Mathias Ettinger
Feb 8 at 16:13
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values oftrials
and extrapolate to work out how long it is going to take.
â Gareth Rees
Feb 8 at 23:59
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
add a comment |Â
up vote
9
down vote
accepted
up vote
9
down vote
accepted
If I rewrite genSequence
so that it takes numTrials
as an argument, then I get the following timing in CPython:
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715
Based on this, it would take about 10 minutes to compute genSequence(10**9)
. Possibly you just didn't wait long enough.
This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243
On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).
Some review points:
The number
1000000000
is hard to read â it could easily be confused with100000000
or10000000000
. I would write10**9
to make it clear.There's no need for the variable
numberOfAttempts
; you could just add one tostorenums
on each loop.The name
storenums
is a bit vague. This is the total length of the random sequences generated so far, so a name liketotal_length
would be clearer.Similarly, the name
genSequence
is vague. This calculates the mean length of a random sequence, so a name likemean_sequence_length
would be clearer.The meaning of the constant 1 is not altogether clear. I would give it a name like
target_sum
.When a loop variable like
x
is not used, it's conventional to name it_
.range(0,numTrials)
can be writtenrange(numTrials)
.
Revised code:
import random
def mean_sequence_length(trials, target_sum=1.0):
"""Return mean length of random sequences adding up to at least
target_sum (carrying out the given number of trials).
"""
total_length = 0
for _ in range(trials):
current_sum = 0.0
while current_sum < target_sum:
current_sum += random.random()
total_length += 1
return total_length / trials
If I rewrite genSequence
so that it takes numTrials
as an argument, then I get the following timing in CPython:
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715
Based on this, it would take about 10 minutes to compute genSequence(10**9)
. Possibly you just didn't wait long enough.
This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243
On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).
Some review points:
The number
1000000000
is hard to read â it could easily be confused with100000000
or10000000000
. I would write10**9
to make it clear.There's no need for the variable
numberOfAttempts
; you could just add one tostorenums
on each loop.The name
storenums
is a bit vague. This is the total length of the random sequences generated so far, so a name liketotal_length
would be clearer.Similarly, the name
genSequence
is vague. This calculates the mean length of a random sequence, so a name likemean_sequence_length
would be clearer.The meaning of the constant 1 is not altogether clear. I would give it a name like
target_sum
.When a loop variable like
x
is not used, it's conventional to name it_
.range(0,numTrials)
can be writtenrange(numTrials)
.
Revised code:
import random
def mean_sequence_length(trials, target_sum=1.0):
"""Return mean length of random sequences adding up to at least
target_sum (carrying out the given number of trials).
"""
total_length = 0
for _ in range(trials):
current_sum = 0.0
while current_sum < target_sum:
current_sum += random.random()
total_length += 1
return total_length / trials
edited Feb 9 at 10:51
answered Feb 8 at 15:50
Gareth Rees
41.1k394168
41.1k394168
2
Additionaly, instead of writing10**9
, one could write1_000_000_000
.
â Mathias Ettinger
Feb 8 at 16:13
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values oftrials
and extrapolate to work out how long it is going to take.
â Gareth Rees
Feb 8 at 23:59
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
add a comment |Â
2
Additionaly, instead of writing10**9
, one could write1_000_000_000
.
â Mathias Ettinger
Feb 8 at 16:13
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values oftrials
and extrapolate to work out how long it is going to take.
â Gareth Rees
Feb 8 at 23:59
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
2
2
Additionaly, instead of writing
10**9
, one could write 1_000_000_000
.â Mathias Ettinger
Feb 8 at 16:13
Additionaly, instead of writing
10**9
, one could write 1_000_000_000
.â Mathias Ettinger
Feb 8 at 16:13
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â PGODULTIMATE
Feb 8 at 23:49
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of
trials
and extrapolate to work out how long it is going to take.â Gareth Rees
Feb 8 at 23:59
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of
trials
and extrapolate to work out how long it is going to take.â Gareth Rees
Feb 8 at 23:59
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
I measured preformance an it says 63.16667 minutes.
â PGODULTIMATE
Feb 9 at 17:18
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â PGODULTIMATE
Feb 9 at 17:40
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187094%2fcalculating-the-average-minimum-length-of-random-numbers-from-0-1-that-adds-up%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password