Calculating the Average Minimum Length of Random Numbers From 0

Calculating the Average Minimum Length of Random Numbers From 0 - 1 that Adds up to Greater Than 1

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
8
down vote

favorite

My Computer Science teacher gave us a task to calculate the average after one billion trials.

His exact assignment was this:

Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)

Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.

Examples:

0.449 0.814

length of sequence: 2

0.167 0.138 0.028 0.934

length of sequence: 4

0.640 0.258 0.417

length of sequence: 3

0.911 0.212

length of sequence: 2

Average of the four lengths is 11/4 Ã¢Â‰Âˆ 2.75

What is the average of one billion random sequences?

My code was this:

import random

def genSequence():
 storenums = 0
 numTrials = 1000000000
 for x in range(0,numTrials):
 numberOfAttempts = 0
 getToOne = 0 
 while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
 getToOne += random.random()
 numberOfAttempts += 1
 storenums = storenums + numberOfAttempts
 #print (x)
 #print(storenums)
 calculateAverage(storenums,numTrials)

def calculateAverage(num,den):
 average = num/den
 print(average)

genSequence()

*Note: I am using repl.it to run my code so there is no main.

The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.

EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.

asked Feb 8 at 14:38

PGODULTIMATE

985

add a commentÂ |Â

up vote
8
down vote

favorite

My Computer Science teacher gave us a task to calculate the average after one billion trials.

His exact assignment was this:

Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)

Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.

Examples:

0.449 0.814

length of sequence: 2

0.167 0.138 0.028 0.934

length of sequence: 4

0.640 0.258 0.417

length of sequence: 3

0.911 0.212

length of sequence: 2

Average of the four lengths is 11/4 Ã¢Â‰Âˆ 2.75

What is the average of one billion random sequences?

My code was this:

import random

def genSequence():
 storenums = 0
 numTrials = 1000000000
 for x in range(0,numTrials):
 numberOfAttempts = 0
 getToOne = 0 
 while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
 getToOne += random.random()
 numberOfAttempts += 1
 storenums = storenums + numberOfAttempts
 #print (x)
 #print(storenums)
 calculateAverage(storenums,numTrials)

def calculateAverage(num,den):
 average = num/den
 print(average)

genSequence()

*Note: I am using repl.it to run my code so there is no main.

EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.

asked Feb 8 at 14:38

PGODULTIMATE

985

add a commentÂ |Â

up vote
8
down vote

favorite

My Computer Science teacher gave us a task to calculate the average after one billion trials.

His exact assignment was this:

Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)

Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.

Examples:

0.449 0.814

length of sequence: 2

0.167 0.138 0.028 0.934

length of sequence: 4

0.640 0.258 0.417

length of sequence: 3

0.911 0.212

length of sequence: 2

Average of the four lengths is 11/4 Ã¢Â‰Âˆ 2.75

What is the average of one billion random sequences?

My code was this:

import random

def genSequence():
 storenums = 0
 numTrials = 1000000000
 for x in range(0,numTrials):
 numberOfAttempts = 0
 getToOne = 0 
 while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
 getToOne += random.random()
 numberOfAttempts += 1
 storenums = storenums + numberOfAttempts
 #print (x)
 #print(storenums)
 calculateAverage(storenums,numTrials)

def calculateAverage(num,den):
 average = num/den
 print(average)

genSequence()

*Note: I am using repl.it to run my code so there is no main.

EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.

asked Feb 8 at 14:38

PGODULTIMATE

985

My Computer Science teacher gave us a task to calculate the average after one billion trials.

His exact assignment was this:

Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)

Write a program that simulates these trials a billion times and calculates the average length of the sequences.
This is a neat exercise in nested loops.

Examples:

0.449 0.814

length of sequence: 2

0.167 0.138 0.028 0.934

length of sequence: 4

0.640 0.258 0.417

length of sequence: 3

0.911 0.212

length of sequence: 2

Average of the four lengths is 11/4 Ã¢Â‰Âˆ 2.75

What is the average of one billion random sequences?

My code was this:

import random

def genSequence():
 storenums = 0
 numTrials = 1000000000
 for x in range(0,numTrials):
 numberOfAttempts = 0
 getToOne = 0 
 while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
 getToOne += random.random()
 numberOfAttempts += 1
 storenums = storenums + numberOfAttempts
 #print (x)
 #print(storenums)
 calculateAverage(storenums,numTrials)

def calculateAverage(num,den):
 average = num/den
 print(average)

genSequence()

*Note: I am using repl.it to run my code so there is no main.

EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.

asked Feb 8 at 14:38

PGODULTIMATE

985

asked Feb 8 at 14:38

PGODULTIMATE

985

asked Feb 8 at 14:38

PGODULTIMATE

985

asked Feb 8 at 14:38

PGODULTIMATE

985

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
9
down vote

accepted

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read Ã¢Â€Â” it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.

There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.

The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.

Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.

The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.

When a loop variable like x is not used, it's conventional to name it _.

range(0,numTrials) can be written range(numTrials).

Revised code:

import random

def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).

 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

edited Feb 9 at 10:51

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

2

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
â€“Â Mathias Ettinger
Feb 8 at 16:13

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â€“Â PGODULTIMATE
Feb 8 at 23:49

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
â€“Â Gareth Rees
Feb 8 at 23:59

I measured preformance an it says 63.16667 minutes.
â€“Â PGODULTIMATE
Feb 9 at 17:18

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â€“Â PGODULTIMATE
Feb 9 at 17:40

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187094%2fcalculating-the-average-minimum-length-of-random-numbers-from-0-1-that-adds-up%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
9
down vote

accepted

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read Ã¢Â€Â” it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.

There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.

The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.

Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.

The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.

When a loop variable like x is not used, it's conventional to name it _.

range(0,numTrials) can be written range(numTrials).

Revised code:

import random

def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).

 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

edited Feb 9 at 10:51

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

2

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
â€“Â Mathias Ettinger
Feb 8 at 16:13

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â€“Â PGODULTIMATE
Feb 8 at 23:49

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
â€“Â Gareth Rees
Feb 8 at 23:59

I measured preformance an it says 63.16667 minutes.
â€“Â PGODULTIMATE
Feb 9 at 17:18

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â€“Â PGODULTIMATE
Feb 9 at 17:40

add a commentÂ |Â

up vote
9
down vote

accepted

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read Ã¢Â€Â” it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.

There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.

The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.

Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.

The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.

When a loop variable like x is not used, it's conventional to name it _.

range(0,numTrials) can be written range(numTrials).

Revised code:

import random

def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).

 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

edited Feb 9 at 10:51

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

2

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
â€“Â Mathias Ettinger
Feb 8 at 16:13

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â€“Â PGODULTIMATE
Feb 8 at 23:49

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
â€“Â Gareth Rees
Feb 8 at 23:59

I measured preformance an it says 63.16667 minutes.
â€“Â PGODULTIMATE
Feb 9 at 17:18

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â€“Â PGODULTIMATE
Feb 9 at 17:40

add a commentÂ |Â

up vote
9
down vote

accepted

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read Ã¢Â€Â” it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.

There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.

The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.

Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.

The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.

When a loop variable like x is not used, it's conventional to name it _.

range(0,numTrials) can be written range(numTrials).

Revised code:

import random

def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).

 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

edited Feb 9 at 10:51

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out $10^9$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read Ã¢Â€Â” it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.

There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.

The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.

Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.

The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.

When a loop variable like x is not used, it's conventional to name it _.

range(0,numTrials) can be written range(numTrials).

Revised code:

import random

def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).

 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

edited Feb 9 at 10:51

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

edited Feb 9 at 10:51

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

answered Feb 8 at 15:50

Gareth Rees

41.1k394168

2

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
â€“Â Mathias Ettinger
Feb 8 at 16:13

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â€“Â PGODULTIMATE
Feb 8 at 23:49

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
â€“Â Gareth Rees
Feb 8 at 23:59

I measured preformance an it says 63.16667 minutes.
â€“Â PGODULTIMATE
Feb 9 at 17:18

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â€“Â PGODULTIMATE
Feb 9 at 17:40

add a commentÂ |Â

2

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
â€“Â Mathias Ettinger
Feb 8 at 16:13

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â€“Â PGODULTIMATE
Feb 8 at 23:49

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
â€“Â Gareth Rees
Feb 8 at 23:59

I measured preformance an it says 63.16667 minutes.
â€“Â PGODULTIMATE
Feb 9 at 17:18

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â€“Â PGODULTIMATE
Feb 9 at 17:40

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
â€“Â Mathias Ettinger
Feb 8 at 16:13

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
â€“Â PGODULTIMATE
Feb 8 at 23:49

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
â€“Â Gareth Rees
Feb 8 at 23:59

I measured preformance an it says 63.16667 minutes.
â€“Â PGODULTIMATE
Feb 9 at 17:18

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!
â€“Â PGODULTIMATE
Feb 9 at 17:40

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr