Regression on Pandas DataFrame

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I am working on the following assignment and I am a bit lost:




Build a regression model that will predict the rating score of each
product based on attributes which correspond to some very common words
used in the reviews (selection of how many words is left to you as a
decision). So, for each product you will have a long(ish) vector of
attributes based on how many times each word appears in reviews of
this product. Your target variable is the rating. You will be judged
on the process of building the model (regularization, subset
selection, validation set, etc.) and not so much on the accuracy of
the results.




This is what I currently have as code (I use mord's API for the regression model since Rating is categorical):



# Create word matrix
bow = df.Review2.str.split().apply(pd.Series.value_counts)
rating = df['Rating']
df_rating = pd.DataFrame([rating])
df_rating = df_rating.transpose()
bow = bow.join(df_rating)

# Remove some columns and rows
bow = bow.loc[(bow['Rating'].notna()), ~(bow.sum(0) < 80)]

# Divide into train - validation - test
bow.fillna(0, inplace=True)
rating = bow['Rating']
bow = bow.drop('Rating', 1)
x_train, x_test, y_train, y_test = train_test_split(bow, rating, test_size=0.4, random_state=0)

# Run regression
regr = m.OrdinalRidge()
regr.fit(x_train, y_train)
scores = cross_val_score(regr, bow, rating, cv=5, scoring='accuracy')
# scores -> array([0.75438596, 0.73684211, 0.66071429, 0.53571429, 0.60714286])
# avg_score -> Accuracy: 0.66 (+/- 0.16)


Could I have some constructive criticism on the above code please (regarding what I am missing from the posed task)?




This is what bow looks like:



enter image description here



If you need any other information just comment I will happily edit it in.



P.S: I've been working on this the past few days, this is my first regression model I code. I'll be honest, I didn't find it easy. So if someone could lend a helping hand I would be so ever grateful. Lastly, I want to be clear that I don't post to get a complete answer, code-wise (even if I would not say no to that). I am merely hoping in some guidance. My deadline approaches fast :P







share|improve this question





















  • The wording of the question feels like the code does not work as expected and you're asking about implementing missing features. Do I understand correctly?
    – Mathias Ettinger
    Jul 15 at 10:26










  • No, I believe that the code works correctly but I am wondering if I "hit all the targets".
    – Stefano Pozzi
    Jul 15 at 11:32
















up vote
2
down vote

favorite












I am working on the following assignment and I am a bit lost:




Build a regression model that will predict the rating score of each
product based on attributes which correspond to some very common words
used in the reviews (selection of how many words is left to you as a
decision). So, for each product you will have a long(ish) vector of
attributes based on how many times each word appears in reviews of
this product. Your target variable is the rating. You will be judged
on the process of building the model (regularization, subset
selection, validation set, etc.) and not so much on the accuracy of
the results.




This is what I currently have as code (I use mord's API for the regression model since Rating is categorical):



# Create word matrix
bow = df.Review2.str.split().apply(pd.Series.value_counts)
rating = df['Rating']
df_rating = pd.DataFrame([rating])
df_rating = df_rating.transpose()
bow = bow.join(df_rating)

# Remove some columns and rows
bow = bow.loc[(bow['Rating'].notna()), ~(bow.sum(0) < 80)]

# Divide into train - validation - test
bow.fillna(0, inplace=True)
rating = bow['Rating']
bow = bow.drop('Rating', 1)
x_train, x_test, y_train, y_test = train_test_split(bow, rating, test_size=0.4, random_state=0)

# Run regression
regr = m.OrdinalRidge()
regr.fit(x_train, y_train)
scores = cross_val_score(regr, bow, rating, cv=5, scoring='accuracy')
# scores -> array([0.75438596, 0.73684211, 0.66071429, 0.53571429, 0.60714286])
# avg_score -> Accuracy: 0.66 (+/- 0.16)


Could I have some constructive criticism on the above code please (regarding what I am missing from the posed task)?




This is what bow looks like:



enter image description here



If you need any other information just comment I will happily edit it in.



P.S: I've been working on this the past few days, this is my first regression model I code. I'll be honest, I didn't find it easy. So if someone could lend a helping hand I would be so ever grateful. Lastly, I want to be clear that I don't post to get a complete answer, code-wise (even if I would not say no to that). I am merely hoping in some guidance. My deadline approaches fast :P







share|improve this question





















  • The wording of the question feels like the code does not work as expected and you're asking about implementing missing features. Do I understand correctly?
    – Mathias Ettinger
    Jul 15 at 10:26










  • No, I believe that the code works correctly but I am wondering if I "hit all the targets".
    – Stefano Pozzi
    Jul 15 at 11:32












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I am working on the following assignment and I am a bit lost:




Build a regression model that will predict the rating score of each
product based on attributes which correspond to some very common words
used in the reviews (selection of how many words is left to you as a
decision). So, for each product you will have a long(ish) vector of
attributes based on how many times each word appears in reviews of
this product. Your target variable is the rating. You will be judged
on the process of building the model (regularization, subset
selection, validation set, etc.) and not so much on the accuracy of
the results.




This is what I currently have as code (I use mord's API for the regression model since Rating is categorical):



# Create word matrix
bow = df.Review2.str.split().apply(pd.Series.value_counts)
rating = df['Rating']
df_rating = pd.DataFrame([rating])
df_rating = df_rating.transpose()
bow = bow.join(df_rating)

# Remove some columns and rows
bow = bow.loc[(bow['Rating'].notna()), ~(bow.sum(0) < 80)]

# Divide into train - validation - test
bow.fillna(0, inplace=True)
rating = bow['Rating']
bow = bow.drop('Rating', 1)
x_train, x_test, y_train, y_test = train_test_split(bow, rating, test_size=0.4, random_state=0)

# Run regression
regr = m.OrdinalRidge()
regr.fit(x_train, y_train)
scores = cross_val_score(regr, bow, rating, cv=5, scoring='accuracy')
# scores -> array([0.75438596, 0.73684211, 0.66071429, 0.53571429, 0.60714286])
# avg_score -> Accuracy: 0.66 (+/- 0.16)


Could I have some constructive criticism on the above code please (regarding what I am missing from the posed task)?




This is what bow looks like:



enter image description here



If you need any other information just comment I will happily edit it in.



P.S: I've been working on this the past few days, this is my first regression model I code. I'll be honest, I didn't find it easy. So if someone could lend a helping hand I would be so ever grateful. Lastly, I want to be clear that I don't post to get a complete answer, code-wise (even if I would not say no to that). I am merely hoping in some guidance. My deadline approaches fast :P







share|improve this question













I am working on the following assignment and I am a bit lost:




Build a regression model that will predict the rating score of each
product based on attributes which correspond to some very common words
used in the reviews (selection of how many words is left to you as a
decision). So, for each product you will have a long(ish) vector of
attributes based on how many times each word appears in reviews of
this product. Your target variable is the rating. You will be judged
on the process of building the model (regularization, subset
selection, validation set, etc.) and not so much on the accuracy of
the results.




This is what I currently have as code (I use mord's API for the regression model since Rating is categorical):



# Create word matrix
bow = df.Review2.str.split().apply(pd.Series.value_counts)
rating = df['Rating']
df_rating = pd.DataFrame([rating])
df_rating = df_rating.transpose()
bow = bow.join(df_rating)

# Remove some columns and rows
bow = bow.loc[(bow['Rating'].notna()), ~(bow.sum(0) < 80)]

# Divide into train - validation - test
bow.fillna(0, inplace=True)
rating = bow['Rating']
bow = bow.drop('Rating', 1)
x_train, x_test, y_train, y_test = train_test_split(bow, rating, test_size=0.4, random_state=0)

# Run regression
regr = m.OrdinalRidge()
regr.fit(x_train, y_train)
scores = cross_val_score(regr, bow, rating, cv=5, scoring='accuracy')
# scores -> array([0.75438596, 0.73684211, 0.66071429, 0.53571429, 0.60714286])
# avg_score -> Accuracy: 0.66 (+/- 0.16)


Could I have some constructive criticism on the above code please (regarding what I am missing from the posed task)?




This is what bow looks like:



enter image description here



If you need any other information just comment I will happily edit it in.



P.S: I've been working on this the past few days, this is my first regression model I code. I'll be honest, I didn't find it easy. So if someone could lend a helping hand I would be so ever grateful. Lastly, I want to be clear that I don't post to get a complete answer, code-wise (even if I would not say no to that). I am merely hoping in some guidance. My deadline approaches fast :P









share|improve this question












share|improve this question




share|improve this question








edited Jul 15 at 2:16









200_success

123k14143399




123k14143399









asked Jul 14 at 15:15









Stefano Pozzi

112




112











  • The wording of the question feels like the code does not work as expected and you're asking about implementing missing features. Do I understand correctly?
    – Mathias Ettinger
    Jul 15 at 10:26










  • No, I believe that the code works correctly but I am wondering if I "hit all the targets".
    – Stefano Pozzi
    Jul 15 at 11:32
















  • The wording of the question feels like the code does not work as expected and you're asking about implementing missing features. Do I understand correctly?
    – Mathias Ettinger
    Jul 15 at 10:26










  • No, I believe that the code works correctly but I am wondering if I "hit all the targets".
    – Stefano Pozzi
    Jul 15 at 11:32















The wording of the question feels like the code does not work as expected and you're asking about implementing missing features. Do I understand correctly?
– Mathias Ettinger
Jul 15 at 10:26




The wording of the question feels like the code does not work as expected and you're asking about implementing missing features. Do I understand correctly?
– Mathias Ettinger
Jul 15 at 10:26












No, I believe that the code works correctly but I am wondering if I "hit all the targets".
– Stefano Pozzi
Jul 15 at 11:32




No, I believe that the code works correctly but I am wondering if I "hit all the targets".
– Stefano Pozzi
Jul 15 at 11:32















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199491%2fregression-on-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199491%2fregression-on-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?