Linear Regression on random data

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
5
down vote

favorite

Wrote a simple script to implement Linear regression and practice numpy/pandas. Uses random data, so obviously weights (thetas) have no significant meaning. Looking for feedback on

Performance

Python code style

Machine Learning code style

# Performs Linear Regression (from scratch) using randomized data
# Optimizes weights by using Gradient Descent Algorithm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(0)

features = 3
trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

randData = np.random.rand(trainingSize, features + 1)
colNames = [f'featurei' for i in range(1, features + 1)]
colNames.append('labels')

dummy_column = pd.Series(np.ones(trainingSize), name='f0')
df = pd.DataFrame(randData, columns=colNames)

X = pd.concat([dummy_column, df.drop(columns='labels')], axis=1)
y = df['labels']
thetas = np.random.rand(features + 1)

cost = lambda thetas: np.mean((np.matmul(X, thetas) - y) ** 2) / 2
dJdtheta = lambda thetas, k: np.mean((np.matmul(X, thetas) - y) * X.iloc[:, k])
gradient = lambda thetas: np.array([dJdtheta(thetas, k) for k in range(X.shape[1])])

# J(theta) before gradient descent
print(cost(thetas))

# Perform gradient descent
errors = np.zeros(trainingSteps)
for step in range(trainingSteps):
 thetas -= learningRate * gradient(thetas)
 errors[step] = cost(thetas)

# J(theta) after gradient descent
print(cost(thetas))

# Plots Cost function as gradient descent runs
plt.plot(errors)
plt.xlabel('Training Steps')
plt.ylabel('Cost Function')
plt.show()

edited Mar 4 at 14:40

200_success

123k14142399

asked Mar 4 at 13:53

Vivek Jha

533

add a commentÂ |Â

up vote
5
down vote

favorite

Wrote a simple script to implement Linear regression and practice numpy/pandas. Uses random data, so obviously weights (thetas) have no significant meaning. Looking for feedback on

Performance

Python code style

Machine Learning code style

# Performs Linear Regression (from scratch) using randomized data
# Optimizes weights by using Gradient Descent Algorithm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(0)

features = 3
trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

randData = np.random.rand(trainingSize, features + 1)
colNames = [f'featurei' for i in range(1, features + 1)]
colNames.append('labels')

dummy_column = pd.Series(np.ones(trainingSize), name='f0')
df = pd.DataFrame(randData, columns=colNames)

X = pd.concat([dummy_column, df.drop(columns='labels')], axis=1)
y = df['labels']
thetas = np.random.rand(features + 1)

cost = lambda thetas: np.mean((np.matmul(X, thetas) - y) ** 2) / 2
dJdtheta = lambda thetas, k: np.mean((np.matmul(X, thetas) - y) * X.iloc[:, k])
gradient = lambda thetas: np.array([dJdtheta(thetas, k) for k in range(X.shape[1])])

# J(theta) before gradient descent
print(cost(thetas))

# Perform gradient descent
errors = np.zeros(trainingSteps)
for step in range(trainingSteps):
 thetas -= learningRate * gradient(thetas)
 errors[step] = cost(thetas)

# J(theta) after gradient descent
print(cost(thetas))

# Plots Cost function as gradient descent runs
plt.plot(errors)
plt.xlabel('Training Steps')
plt.ylabel('Cost Function')
plt.show()

edited Mar 4 at 14:40

200_success

123k14142399

asked Mar 4 at 13:53

Vivek Jha

533

add a commentÂ |Â

up vote
5
down vote

favorite

Wrote a simple script to implement Linear regression and practice numpy/pandas. Uses random data, so obviously weights (thetas) have no significant meaning. Looking for feedback on

Performance

Python code style

Machine Learning code style

# Performs Linear Regression (from scratch) using randomized data
# Optimizes weights by using Gradient Descent Algorithm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(0)

features = 3
trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

randData = np.random.rand(trainingSize, features + 1)
colNames = [f'featurei' for i in range(1, features + 1)]
colNames.append('labels')

dummy_column = pd.Series(np.ones(trainingSize), name='f0')
df = pd.DataFrame(randData, columns=colNames)

X = pd.concat([dummy_column, df.drop(columns='labels')], axis=1)
y = df['labels']
thetas = np.random.rand(features + 1)

cost = lambda thetas: np.mean((np.matmul(X, thetas) - y) ** 2) / 2
dJdtheta = lambda thetas, k: np.mean((np.matmul(X, thetas) - y) * X.iloc[:, k])
gradient = lambda thetas: np.array([dJdtheta(thetas, k) for k in range(X.shape[1])])

# J(theta) before gradient descent
print(cost(thetas))

# Perform gradient descent
errors = np.zeros(trainingSteps)
for step in range(trainingSteps):
 thetas -= learningRate * gradient(thetas)
 errors[step] = cost(thetas)

# J(theta) after gradient descent
print(cost(thetas))

# Plots Cost function as gradient descent runs
plt.plot(errors)
plt.xlabel('Training Steps')
plt.ylabel('Cost Function')
plt.show()

edited Mar 4 at 14:40

200_success

123k14142399

asked Mar 4 at 13:53

Vivek Jha

533

Wrote a simple script to implement Linear regression and practice numpy/pandas. Uses random data, so obviously weights (thetas) have no significant meaning. Looking for feedback on

Performance

Python code style

Machine Learning code style

# Performs Linear Regression (from scratch) using randomized data
# Optimizes weights by using Gradient Descent Algorithm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(0)

features = 3
trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

randData = np.random.rand(trainingSize, features + 1)
colNames = [f'featurei' for i in range(1, features + 1)]
colNames.append('labels')

dummy_column = pd.Series(np.ones(trainingSize), name='f0')
df = pd.DataFrame(randData, columns=colNames)

X = pd.concat([dummy_column, df.drop(columns='labels')], axis=1)
y = df['labels']
thetas = np.random.rand(features + 1)

cost = lambda thetas: np.mean((np.matmul(X, thetas) - y) ** 2) / 2
dJdtheta = lambda thetas, k: np.mean((np.matmul(X, thetas) - y) * X.iloc[:, k])
gradient = lambda thetas: np.array([dJdtheta(thetas, k) for k in range(X.shape[1])])

# J(theta) before gradient descent
print(cost(thetas))

# Perform gradient descent
errors = np.zeros(trainingSteps)
for step in range(trainingSteps):
 thetas -= learningRate * gradient(thetas)
 errors[step] = cost(thetas)

# J(theta) after gradient descent
print(cost(thetas))

# Plots Cost function as gradient descent runs
plt.plot(errors)
plt.xlabel('Training Steps')
plt.ylabel('Cost Function')
plt.show()

edited Mar 4 at 14:40

200_success

123k14142399

asked Mar 4 at 13:53

Vivek Jha

533

edited Mar 4 at 14:40

200_success

123k14142399

edited Mar 4 at 14:40

200_success

123k14142399

edited Mar 4 at 14:40

200_success

123k14142399

asked Mar 4 at 13:53

Vivek Jha

533

asked Mar 4 at 13:53

Vivek Jha

533

asked Mar 4 at 13:53

Vivek Jha

533

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
0
down vote

Welcome!

Your first two lines are nice comments. Consider putting them in a module docstring:

"""Performs Linear Regression (from scratch) using randomized data.

Optimizes weights by using Gradient Descent Algorithm.
"""

Consider adding random noise to something linear (or to some "wrong model" sine or polynomial), rather than to a constant.

np.random.seed(0)

Nice - reproducibility is Good.

trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

These expressions are correct and clear. But why evaluate a FP expression when you could just write it as a literal? 1e1, 1e3, 1e-2. (This answer would apply in many languages, including Python. And yes, I actually prefer seeing the two integers written as floating point, even if that forces me to call int() on them.)

PEP8 asks that you spell it training_size, and so on. Please run flake8, and follow its advice.

Your column names expression is fine. Consider handling the one-origin within the format expression:

col_names = [f'featurei + 1' for i in range(features)] + ['labels']

Specifying axis=1 is correct. I have a (weak) preference for explicitly spelling out: axis='columns'.

Consider hoisting the expression np.matmul(X, thetas) - y, so it is only evaluated once.

The three lambda expressions are fine, but they don't seem to buy you anything. Probably better to use def three times.

Ship it! But do consider noising a linear function, to make it easier to evaluate your results.

answered May 15 at 3:17

J_H

4,317129

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188790%2flinear-regression-on-random-data%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Welcome!

Your first two lines are nice comments. Consider putting them in a module docstring:

"""Performs Linear Regression (from scratch) using randomized data.

Optimizes weights by using Gradient Descent Algorithm.
"""

Consider adding random noise to something linear (or to some "wrong model" sine or polynomial), rather than to a constant.

np.random.seed(0)

Nice - reproducibility is Good.

trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

PEP8 asks that you spell it training_size, and so on. Please run flake8, and follow its advice.

Your column names expression is fine. Consider handling the one-origin within the format expression:

col_names = [f'featurei + 1' for i in range(features)] + ['labels']

Specifying axis=1 is correct. I have a (weak) preference for explicitly spelling out: axis='columns'.

Consider hoisting the expression np.matmul(X, thetas) - y, so it is only evaluated once.

The three lambda expressions are fine, but they don't seem to buy you anything. Probably better to use def three times.

Ship it! But do consider noising a linear function, to make it easier to evaluate your results.

answered May 15 at 3:17

J_H

4,317129

add a commentÂ |Â

up vote
0
down vote

Welcome!

Your first two lines are nice comments. Consider putting them in a module docstring:

"""Performs Linear Regression (from scratch) using randomized data.

Optimizes weights by using Gradient Descent Algorithm.
"""

Consider adding random noise to something linear (or to some "wrong model" sine or polynomial), rather than to a constant.

np.random.seed(0)

Nice - reproducibility is Good.

trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

PEP8 asks that you spell it training_size, and so on. Please run flake8, and follow its advice.

Your column names expression is fine. Consider handling the one-origin within the format expression:

col_names = [f'featurei + 1' for i in range(features)] + ['labels']

Specifying axis=1 is correct. I have a (weak) preference for explicitly spelling out: axis='columns'.

Consider hoisting the expression np.matmul(X, thetas) - y, so it is only evaluated once.

The three lambda expressions are fine, but they don't seem to buy you anything. Probably better to use def three times.

Ship it! But do consider noising a linear function, to make it easier to evaluate your results.

answered May 15 at 3:17

J_H

4,317129

add a commentÂ |Â

up vote
0
down vote

Welcome!

Your first two lines are nice comments. Consider putting them in a module docstring:

"""Performs Linear Regression (from scratch) using randomized data.

Optimizes weights by using Gradient Descent Algorithm.
"""

Consider adding random noise to something linear (or to some "wrong model" sine or polynomial), rather than to a constant.

np.random.seed(0)

Nice - reproducibility is Good.

trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

PEP8 asks that you spell it training_size, and so on. Please run flake8, and follow its advice.

Your column names expression is fine. Consider handling the one-origin within the format expression:

col_names = [f'featurei + 1' for i in range(features)] + ['labels']

Specifying axis=1 is correct. I have a (weak) preference for explicitly spelling out: axis='columns'.

Consider hoisting the expression np.matmul(X, thetas) - y, so it is only evaluated once.

The three lambda expressions are fine, but they don't seem to buy you anything. Probably better to use def three times.

Ship it! But do consider noising a linear function, to make it easier to evaluate your results.

answered May 15 at 3:17

J_H

4,317129

Welcome!

Your first two lines are nice comments. Consider putting them in a module docstring:

"""Performs Linear Regression (from scratch) using randomized data.

Optimizes weights by using Gradient Descent Algorithm.
"""

Consider adding random noise to something linear (or to some "wrong model" sine or polynomial), rather than to a constant.

np.random.seed(0)

Nice - reproducibility is Good.

trainingSize = 10 ** 1
trainingSteps = 10 ** 3
learningRate = 10 ** -2

PEP8 asks that you spell it training_size, and so on. Please run flake8, and follow its advice.

Your column names expression is fine. Consider handling the one-origin within the format expression:

col_names = [f'featurei + 1' for i in range(features)] + ['labels']

Specifying axis=1 is correct. I have a (weak) preference for explicitly spelling out: axis='columns'.

Consider hoisting the expression np.matmul(X, thetas) - y, so it is only evaluated once.

The three lambda expressions are fine, but they don't seem to buy you anything. Probably better to use def three times.

Ship it! But do consider noising a linear function, to make it easier to evaluate your results.

answered May 15 at 3:17

J_H

4,317129

answered May 15 at 3:17

J_H

4,317129

answered May 15 at 3:17

J_H

4,317129

answered May 15 at 3:17

J_H

4,317129

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr