Probability of event using Central Limit Theorem + plotting results

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
5
down vote

favorite

I've been doing a Udemy course called: "Statistics for Data Science" and I decided to solve one of the homework with Python to kill two birds with one rocket #elon.

The task was:

The team of traders under your supervision earns profits which can be
approximated with Laplace distribution. Profits (of any trade) have a
mean of $95.70 and a std. dev. of $1,247. Your team makes about 100
trades every week.

Questions:

A. What is the probability of my team making a loss in any
given week?
B. What is the probability of my team making over $20,000
in any given week?

As I just started to learn Python I would be happy for some hints and opinions.

# set up
import math
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np

# Data
mu = 95.7 # mean
sigma = 1247 # standard deviation
n = 100 # sampling size (trades here)
xcritical1 = 0 # making a loss
xcritical2 = 20000 / n # Earning $20k a weak by 100 trades

mu_1 = mu # Based on Central Limit Theorem
sigma_1 = sigma / (math.sqrt(n)) # Based on CLT

# Calc
def Z(xcritical, mu, sigma):
 return (xcritical - mu) / sigma # Standard Score (z-value)

Z1 = Z(xcritical1, mu_1, sigma_1)
Z2 = Z(xcritical2, mu_1, sigma_1)

P1 = st.norm.cdf(Z1) # Cumulative Distribution Function for ND
P2 = 1 - st.norm.cdf(Z2)

print('A. Probability of making loss in any given week is', '0:.4g'.format(P1*100) + '%')
print('B. Probability of making over $20k in any given week is', '0:.4g'.format(P2*100) + '%')

# Plots
def draw_z_score(x, cond, mu, sigma, title):
 y = st.norm.pdf(x, mu, sigma) # Probability Density function for ND
 z = x[cond]
 plt.plot(x, y)
 plt.fill_between(z, 0, st.norm.pdf(z, mu, sigma))
 plt.title(title)
 plt.text(-300, 0.0020, r'$mu=' + str(mu_1) + ', sigma=' + str(sigma_1) + '$')
 plt.show()

x = np.arange(-400, 500, 1) # Fixed interval by experimenting
title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

draw_z_score(x, x < xcritical1, mu_1, sigma_1, title1)
draw_z_score(x, x > xcritical2, mu_1, sigma_1, title2)

edited Feb 12 at 15:33

Graipher

20.5k43081

asked Feb 8 at 23:21

Mateusz Konopelski

654

Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description?
â€“Â Graipher
Feb 12 at 6:13

Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task.
â€“Â Mateusz Konopelski
Feb 12 at 14:48

Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss
â€“Â Graipher
Feb 12 at 15:30

add a commentÂ |Â

up vote
5
down vote

favorite

I've been doing a Udemy course called: "Statistics for Data Science" and I decided to solve one of the homework with Python to kill two birds with one rocket #elon.

The task was:

The team of traders under your supervision earns profits which can be
approximated with Laplace distribution. Profits (of any trade) have a
mean of $95.70 and a std. dev. of $1,247. Your team makes about 100
trades every week.

Questions:

A. What is the probability of my team making a loss in any
given week?
B. What is the probability of my team making over $20,000
in any given week?

As I just started to learn Python I would be happy for some hints and opinions.

# set up
import math
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np

# Data
mu = 95.7 # mean
sigma = 1247 # standard deviation
n = 100 # sampling size (trades here)
xcritical1 = 0 # making a loss
xcritical2 = 20000 / n # Earning $20k a weak by 100 trades

mu_1 = mu # Based on Central Limit Theorem
sigma_1 = sigma / (math.sqrt(n)) # Based on CLT

# Calc
def Z(xcritical, mu, sigma):
 return (xcritical - mu) / sigma # Standard Score (z-value)

Z1 = Z(xcritical1, mu_1, sigma_1)
Z2 = Z(xcritical2, mu_1, sigma_1)

P1 = st.norm.cdf(Z1) # Cumulative Distribution Function for ND
P2 = 1 - st.norm.cdf(Z2)

print('A. Probability of making loss in any given week is', '0:.4g'.format(P1*100) + '%')
print('B. Probability of making over $20k in any given week is', '0:.4g'.format(P2*100) + '%')

# Plots
def draw_z_score(x, cond, mu, sigma, title):
 y = st.norm.pdf(x, mu, sigma) # Probability Density function for ND
 z = x[cond]
 plt.plot(x, y)
 plt.fill_between(z, 0, st.norm.pdf(z, mu, sigma))
 plt.title(title)
 plt.text(-300, 0.0020, r'$mu=' + str(mu_1) + ', sigma=' + str(sigma_1) + '$')
 plt.show()

x = np.arange(-400, 500, 1) # Fixed interval by experimenting
title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

draw_z_score(x, x < xcritical1, mu_1, sigma_1, title1)
draw_z_score(x, x > xcritical2, mu_1, sigma_1, title2)

edited Feb 12 at 15:33

Graipher

20.5k43081

asked Feb 8 at 23:21

Mateusz Konopelski

654

Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description?
â€“Â Graipher
Feb 12 at 6:13

Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task.
â€“Â Mateusz Konopelski
Feb 12 at 14:48

Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss
â€“Â Graipher
Feb 12 at 15:30

add a commentÂ |Â

up vote
5
down vote

favorite

I've been doing a Udemy course called: "Statistics for Data Science" and I decided to solve one of the homework with Python to kill two birds with one rocket #elon.

The task was:

The team of traders under your supervision earns profits which can be
approximated with Laplace distribution. Profits (of any trade) have a
mean of $95.70 and a std. dev. of $1,247. Your team makes about 100
trades every week.

Questions:

A. What is the probability of my team making a loss in any
given week?
B. What is the probability of my team making over $20,000
in any given week?

As I just started to learn Python I would be happy for some hints and opinions.

# set up
import math
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np

# Data
mu = 95.7 # mean
sigma = 1247 # standard deviation
n = 100 # sampling size (trades here)
xcritical1 = 0 # making a loss
xcritical2 = 20000 / n # Earning $20k a weak by 100 trades

mu_1 = mu # Based on Central Limit Theorem
sigma_1 = sigma / (math.sqrt(n)) # Based on CLT

# Calc
def Z(xcritical, mu, sigma):
 return (xcritical - mu) / sigma # Standard Score (z-value)

Z1 = Z(xcritical1, mu_1, sigma_1)
Z2 = Z(xcritical2, mu_1, sigma_1)

P1 = st.norm.cdf(Z1) # Cumulative Distribution Function for ND
P2 = 1 - st.norm.cdf(Z2)

print('A. Probability of making loss in any given week is', '0:.4g'.format(P1*100) + '%')
print('B. Probability of making over $20k in any given week is', '0:.4g'.format(P2*100) + '%')

# Plots
def draw_z_score(x, cond, mu, sigma, title):
 y = st.norm.pdf(x, mu, sigma) # Probability Density function for ND
 z = x[cond]
 plt.plot(x, y)
 plt.fill_between(z, 0, st.norm.pdf(z, mu, sigma))
 plt.title(title)
 plt.text(-300, 0.0020, r'$mu=' + str(mu_1) + ', sigma=' + str(sigma_1) + '$')
 plt.show()

x = np.arange(-400, 500, 1) # Fixed interval by experimenting
title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

draw_z_score(x, x < xcritical1, mu_1, sigma_1, title1)
draw_z_score(x, x > xcritical2, mu_1, sigma_1, title2)

edited Feb 12 at 15:33

Graipher

20.5k43081

asked Feb 8 at 23:21

Mateusz Konopelski

654

I've been doing a Udemy course called: "Statistics for Data Science" and I decided to solve one of the homework with Python to kill two birds with one rocket #elon.

The task was:

The team of traders under your supervision earns profits which can be
approximated with Laplace distribution. Profits (of any trade) have a
mean of $95.70 and a std. dev. of $1,247. Your team makes about 100
trades every week.

Questions:

A. What is the probability of my team making a loss in any
given week?
B. What is the probability of my team making over $20,000
in any given week?

As I just started to learn Python I would be happy for some hints and opinions.

# set up
import math
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np

# Data
mu = 95.7 # mean
sigma = 1247 # standard deviation
n = 100 # sampling size (trades here)
xcritical1 = 0 # making a loss
xcritical2 = 20000 / n # Earning $20k a weak by 100 trades

mu_1 = mu # Based on Central Limit Theorem
sigma_1 = sigma / (math.sqrt(n)) # Based on CLT

# Calc
def Z(xcritical, mu, sigma):
 return (xcritical - mu) / sigma # Standard Score (z-value)

Z1 = Z(xcritical1, mu_1, sigma_1)
Z2 = Z(xcritical2, mu_1, sigma_1)

P1 = st.norm.cdf(Z1) # Cumulative Distribution Function for ND
P2 = 1 - st.norm.cdf(Z2)

print('A. Probability of making loss in any given week is', '0:.4g'.format(P1*100) + '%')
print('B. Probability of making over $20k in any given week is', '0:.4g'.format(P2*100) + '%')

# Plots
def draw_z_score(x, cond, mu, sigma, title):
 y = st.norm.pdf(x, mu, sigma) # Probability Density function for ND
 z = x[cond]
 plt.plot(x, y)
 plt.fill_between(z, 0, st.norm.pdf(z, mu, sigma))
 plt.title(title)
 plt.text(-300, 0.0020, r'$mu=' + str(mu_1) + ', sigma=' + str(sigma_1) + '$')
 plt.show()

x = np.arange(-400, 500, 1) # Fixed interval by experimenting
title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

draw_z_score(x, x < xcritical1, mu_1, sigma_1, title1)
draw_z_score(x, x > xcritical2, mu_1, sigma_1, title2)

edited Feb 12 at 15:33

Graipher

20.5k43081

asked Feb 8 at 23:21

Mateusz Konopelski

654

edited Feb 12 at 15:33

Graipher

20.5k43081

edited Feb 12 at 15:33

Graipher

20.5k43081

edited Feb 12 at 15:33

Graipher

20.5k43081

asked Feb 8 at 23:21

Mateusz Konopelski

654

asked Feb 8 at 23:21

Mateusz Konopelski

654

asked Feb 8 at 23:21

Mateusz Konopelski

654

Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description?
â€“Â Graipher
Feb 12 at 6:13

Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task.
â€“Â Mateusz Konopelski
Feb 12 at 14:48

Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss
â€“Â Graipher
Feb 12 at 15:30

add a commentÂ |Â

Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description?
â€“Â Graipher
Feb 12 at 6:13

Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task.
â€“Â Mateusz Konopelski
Feb 12 at 14:48

Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss
â€“Â Graipher
Feb 12 at 15:30

Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description?
â€“Â Graipher
Feb 12 at 6:13

Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task.
â€“Â Mateusz Konopelski
Feb 12 at 14:48

Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss
â€“Â Graipher
Feb 12 at 15:30

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
4
down vote

accepted

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: 0:.4g%'.format(P1*100)
title2 = 'Probability of earning more than $20k: 0:.4g%'.format(P2*100)

You could also use the % formatting option for floats (described in the fifth table here), which automatically multiplies with 100 and adds the % sign. The number of digits after the decimal point work differently, though, compared to the g format:

":.4%".format(0.5)
# '50.0000%'
":.4g%".format(0.5*100.)
# '50%'
":.4g%".format(0.5123123*100.)
# '51.23%'
":.4%".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something

GLOBAL_CONSTANT = "value"

class Definitions

def functions()

if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

That last part is there to protect your code from being executed if you want to import some part of this script from another script. At the moment if you want to do from laplace import Z, your whole code would run. With the if __name__ == '__main__' guard, it will only run when you execute this script.

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

Yes, it is justified to use the normal distribution here, since the Central Limit Theorem guarantees that the sum of random variables tends towards a normal distribution. The approximation is quite good for $N = 100$, as can be seen with the following small script:

import numpy as np
import matplotlib.pyplot as plt

mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate

x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)

plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

edited Feb 13 at 13:14

answered Feb 12 at 15:40

Graipher

20.5k43081

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187132%2fprobability-of-event-using-central-limit-theorem-plotting-results%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
4
down vote

accepted

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: 0:.4g%'.format(P1*100)
title2 = 'Probability of earning more than $20k: 0:.4g%'.format(P2*100)

":.4%".format(0.5)
# '50.0000%'
":.4g%".format(0.5*100.)
# '50%'
":.4g%".format(0.5123123*100.)
# '51.23%'
":.4%".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something

GLOBAL_CONSTANT = "value"

class Definitions

def functions()

if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

import numpy as np
import matplotlib.pyplot as plt

mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate

x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)

plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

edited Feb 13 at 13:14

answered Feb 12 at 15:40

Graipher

20.5k43081

add a commentÂ |Â

up vote
4
down vote

accepted

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: 0:.4g%'.format(P1*100)
title2 = 'Probability of earning more than $20k: 0:.4g%'.format(P2*100)

":.4%".format(0.5)
# '50.0000%'
":.4g%".format(0.5*100.)
# '50%'
":.4g%".format(0.5123123*100.)
# '51.23%'
":.4%".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something

GLOBAL_CONSTANT = "value"

class Definitions

def functions()

if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

import numpy as np
import matplotlib.pyplot as plt

mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate

x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)

plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

edited Feb 13 at 13:14

answered Feb 12 at 15:40

Graipher

20.5k43081

add a commentÂ |Â

up vote
4
down vote

accepted

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: 0:.4g%'.format(P1*100)
title2 = 'Probability of earning more than $20k: 0:.4g%'.format(P2*100)

":.4%".format(0.5)
# '50.0000%'
":.4g%".format(0.5*100.)
# '50%'
":.4g%".format(0.5123123*100.)
# '51.23%'
":.4%".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something

GLOBAL_CONSTANT = "value"

class Definitions

def functions()

if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

import numpy as np
import matplotlib.pyplot as plt

mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate

x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)

plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

edited Feb 13 at 13:14

answered Feb 12 at 15:40

Graipher

20.5k43081

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '0:.4g'.format(P1*100) + '%'
title2 = 'Probability of earning more than $20k: ' + '0:.4g'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: 0:.4g%'.format(P1*100)
title2 = 'Probability of earning more than $20k: 0:.4g%'.format(P2*100)

":.4%".format(0.5)
# '50.0000%'
":.4g%".format(0.5*100.)
# '50%'
":.4g%".format(0.5123123*100.)
# '51.23%'
":.4%".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something

GLOBAL_CONSTANT = "value"

class Definitions

def functions()

if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

import numpy as np
import matplotlib.pyplot as plt

mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate

x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)

plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

edited Feb 13 at 13:14

answered Feb 12 at 15:40

Graipher

20.5k43081

edited Feb 13 at 13:14

answered Feb 12 at 15:40

Graipher

20.5k43081

answered Feb 12 at 15:40

Graipher

20.5k43081

answered Feb 12 at 15:40

Graipher

20.5k43081

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr