R function to generate predictions from ratings

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












I am trying to improve the run time of a program I wrote in R. Generally, what I am doing is feeding a function a data frame of values and generating a prediction off of operations on specific columns. The function is a custom function that is being used with sapply (code below). What I'm doing is much too large to provide any meaningful example, so instead I will try to describe the inputs to the process. I know this will restrict how helpful answers can be, but I am interested in any ideas for optimizing the time it takes me to compute a prediction. Currently it is taking me about 10 seconds to generate one prediction (running the sapply for one line of a dataframe).



mean_rating <- function(df)
user<-df$user
movie<-df$movie
u_row<-which(U_lookup == user)[1]
m_row<-which(M_lookup==movie)[1]

knn_match<- knn_txt[u_row,1:100]

knn_match1<-as.numeric(unlist(knn_match))

dfm_test<- dfm[knn_match1,]

dfm_mov<- dfm_test[,m_row] # row number from DFM associated with the query_movie
C<-mean(dfm_mov)



test<-sapply(1:nrow(probe_test),function(x) mean_rating(probe_test[x,]))


Inputs: dfm is my main data matrix, users in the rows and movies in the columns. Very sparse.



> str(dfm)
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:99072112] 378 1137 1755 1893 2359 3156 3423 4380 5103 6762 ...
..@ j : int [1:99072112] 0 0 0 0 0 0 0 0 0 0 ...
..@ Dim : int [1:2] 480189 17770
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:99072112] 4 5 4 1 4 5 4 5 3 3 ...
..@ factors : list()


probe_test is my test set, the set I'm trying to predict for. The actual probe test contains approximately 1.4 million rows but I am trying it on a subset first to optimize the time. It is being fed into my function.



> str(probe_test)
'data.frame': 6 obs. of 6 variables:
$ X : int 1 2 3 4 5 6
$ movie : int 1 1 1 1 1 1
$ user : int 1027056 1059319 1149588 1283744 1394012 1406595
$ Rating : int 3 3 4 3 5 4
$ Rating_Date: Factor w/ 1929 levels "2000-01-06","2000-01-08",..: 1901 1847 1911 1312 1917 1803
$ Indicator : int 1 1 1 1 1 1


U_lookup is the lookup I use to convert between user id and the line of the matrix a user is in since we lose user id's when they are converted to a sparse matrix.



> str(U_lookup)
'data.frame': 480189 obs. of 1 variable:
$ x: int 10 100000 1000004 1000027 1000033 1000035 1000038 1000051 1000053 1000057 ...


M_lookup is the lookup I use to convert between movie id and the column of a matrix a movie is in for similar reasons as above.



> str(M_lookup)
'data.frame': 17770 obs. of 1 variable:
$ x: int 1 10 100 1000 10000 10001 10002 10003 10004 10005 ...


knn_text contains the 100 nearest neighbors for all the lines of dfm



> str(knn_txt)
'data.frame': 480189 obs. of 200 variables:


Does anyone have suggestions on how I could improve performance within R? Does anyone have other language suggestions? I am slightly familiar with Python so I've been looking into that one, but if anyone has specific tips on redoing this in Python I would be grateful as I'm very inexperienced with it.







share|improve this question





















  • @user2355903 Can you supply code that generates or simulates your data, so we can run your code? Otherwise it is really hard to help you.
    – minem
    Jul 18 at 6:18










  • I made a new question here, if you were interested in looking at it: codereview.stackexchange.com/questions/199873/…
    – user2355903
    Jul 20 at 0:36






  • 1




    Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Vogel612♦
    Jul 20 at 7:18






  • 1




    So I should make a new question with example data?
    – user2355903
    Jul 20 at 13:35






  • 1




    Possible duplicate of R Function to Generate Predictions from Ratings and Save Results
    – AkselA
    Jul 25 at 15:03
















up vote
1
down vote

favorite












I am trying to improve the run time of a program I wrote in R. Generally, what I am doing is feeding a function a data frame of values and generating a prediction off of operations on specific columns. The function is a custom function that is being used with sapply (code below). What I'm doing is much too large to provide any meaningful example, so instead I will try to describe the inputs to the process. I know this will restrict how helpful answers can be, but I am interested in any ideas for optimizing the time it takes me to compute a prediction. Currently it is taking me about 10 seconds to generate one prediction (running the sapply for one line of a dataframe).



mean_rating <- function(df)
user<-df$user
movie<-df$movie
u_row<-which(U_lookup == user)[1]
m_row<-which(M_lookup==movie)[1]

knn_match<- knn_txt[u_row,1:100]

knn_match1<-as.numeric(unlist(knn_match))

dfm_test<- dfm[knn_match1,]

dfm_mov<- dfm_test[,m_row] # row number from DFM associated with the query_movie
C<-mean(dfm_mov)



test<-sapply(1:nrow(probe_test),function(x) mean_rating(probe_test[x,]))


Inputs: dfm is my main data matrix, users in the rows and movies in the columns. Very sparse.



> str(dfm)
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:99072112] 378 1137 1755 1893 2359 3156 3423 4380 5103 6762 ...
..@ j : int [1:99072112] 0 0 0 0 0 0 0 0 0 0 ...
..@ Dim : int [1:2] 480189 17770
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:99072112] 4 5 4 1 4 5 4 5 3 3 ...
..@ factors : list()


probe_test is my test set, the set I'm trying to predict for. The actual probe test contains approximately 1.4 million rows but I am trying it on a subset first to optimize the time. It is being fed into my function.



> str(probe_test)
'data.frame': 6 obs. of 6 variables:
$ X : int 1 2 3 4 5 6
$ movie : int 1 1 1 1 1 1
$ user : int 1027056 1059319 1149588 1283744 1394012 1406595
$ Rating : int 3 3 4 3 5 4
$ Rating_Date: Factor w/ 1929 levels "2000-01-06","2000-01-08",..: 1901 1847 1911 1312 1917 1803
$ Indicator : int 1 1 1 1 1 1


U_lookup is the lookup I use to convert between user id and the line of the matrix a user is in since we lose user id's when they are converted to a sparse matrix.



> str(U_lookup)
'data.frame': 480189 obs. of 1 variable:
$ x: int 10 100000 1000004 1000027 1000033 1000035 1000038 1000051 1000053 1000057 ...


M_lookup is the lookup I use to convert between movie id and the column of a matrix a movie is in for similar reasons as above.



> str(M_lookup)
'data.frame': 17770 obs. of 1 variable:
$ x: int 1 10 100 1000 10000 10001 10002 10003 10004 10005 ...


knn_text contains the 100 nearest neighbors for all the lines of dfm



> str(knn_txt)
'data.frame': 480189 obs. of 200 variables:


Does anyone have suggestions on how I could improve performance within R? Does anyone have other language suggestions? I am slightly familiar with Python so I've been looking into that one, but if anyone has specific tips on redoing this in Python I would be grateful as I'm very inexperienced with it.







share|improve this question





















  • @user2355903 Can you supply code that generates or simulates your data, so we can run your code? Otherwise it is really hard to help you.
    – minem
    Jul 18 at 6:18










  • I made a new question here, if you were interested in looking at it: codereview.stackexchange.com/questions/199873/…
    – user2355903
    Jul 20 at 0:36






  • 1




    Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Vogel612♦
    Jul 20 at 7:18






  • 1




    So I should make a new question with example data?
    – user2355903
    Jul 20 at 13:35






  • 1




    Possible duplicate of R Function to Generate Predictions from Ratings and Save Results
    – AkselA
    Jul 25 at 15:03












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am trying to improve the run time of a program I wrote in R. Generally, what I am doing is feeding a function a data frame of values and generating a prediction off of operations on specific columns. The function is a custom function that is being used with sapply (code below). What I'm doing is much too large to provide any meaningful example, so instead I will try to describe the inputs to the process. I know this will restrict how helpful answers can be, but I am interested in any ideas for optimizing the time it takes me to compute a prediction. Currently it is taking me about 10 seconds to generate one prediction (running the sapply for one line of a dataframe).



mean_rating <- function(df)
user<-df$user
movie<-df$movie
u_row<-which(U_lookup == user)[1]
m_row<-which(M_lookup==movie)[1]

knn_match<- knn_txt[u_row,1:100]

knn_match1<-as.numeric(unlist(knn_match))

dfm_test<- dfm[knn_match1,]

dfm_mov<- dfm_test[,m_row] # row number from DFM associated with the query_movie
C<-mean(dfm_mov)



test<-sapply(1:nrow(probe_test),function(x) mean_rating(probe_test[x,]))


Inputs: dfm is my main data matrix, users in the rows and movies in the columns. Very sparse.



> str(dfm)
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:99072112] 378 1137 1755 1893 2359 3156 3423 4380 5103 6762 ...
..@ j : int [1:99072112] 0 0 0 0 0 0 0 0 0 0 ...
..@ Dim : int [1:2] 480189 17770
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:99072112] 4 5 4 1 4 5 4 5 3 3 ...
..@ factors : list()


probe_test is my test set, the set I'm trying to predict for. The actual probe test contains approximately 1.4 million rows but I am trying it on a subset first to optimize the time. It is being fed into my function.



> str(probe_test)
'data.frame': 6 obs. of 6 variables:
$ X : int 1 2 3 4 5 6
$ movie : int 1 1 1 1 1 1
$ user : int 1027056 1059319 1149588 1283744 1394012 1406595
$ Rating : int 3 3 4 3 5 4
$ Rating_Date: Factor w/ 1929 levels "2000-01-06","2000-01-08",..: 1901 1847 1911 1312 1917 1803
$ Indicator : int 1 1 1 1 1 1


U_lookup is the lookup I use to convert between user id and the line of the matrix a user is in since we lose user id's when they are converted to a sparse matrix.



> str(U_lookup)
'data.frame': 480189 obs. of 1 variable:
$ x: int 10 100000 1000004 1000027 1000033 1000035 1000038 1000051 1000053 1000057 ...


M_lookup is the lookup I use to convert between movie id and the column of a matrix a movie is in for similar reasons as above.



> str(M_lookup)
'data.frame': 17770 obs. of 1 variable:
$ x: int 1 10 100 1000 10000 10001 10002 10003 10004 10005 ...


knn_text contains the 100 nearest neighbors for all the lines of dfm



> str(knn_txt)
'data.frame': 480189 obs. of 200 variables:


Does anyone have suggestions on how I could improve performance within R? Does anyone have other language suggestions? I am slightly familiar with Python so I've been looking into that one, but if anyone has specific tips on redoing this in Python I would be grateful as I'm very inexperienced with it.







share|improve this question













I am trying to improve the run time of a program I wrote in R. Generally, what I am doing is feeding a function a data frame of values and generating a prediction off of operations on specific columns. The function is a custom function that is being used with sapply (code below). What I'm doing is much too large to provide any meaningful example, so instead I will try to describe the inputs to the process. I know this will restrict how helpful answers can be, but I am interested in any ideas for optimizing the time it takes me to compute a prediction. Currently it is taking me about 10 seconds to generate one prediction (running the sapply for one line of a dataframe).



mean_rating <- function(df)
user<-df$user
movie<-df$movie
u_row<-which(U_lookup == user)[1]
m_row<-which(M_lookup==movie)[1]

knn_match<- knn_txt[u_row,1:100]

knn_match1<-as.numeric(unlist(knn_match))

dfm_test<- dfm[knn_match1,]

dfm_mov<- dfm_test[,m_row] # row number from DFM associated with the query_movie
C<-mean(dfm_mov)



test<-sapply(1:nrow(probe_test),function(x) mean_rating(probe_test[x,]))


Inputs: dfm is my main data matrix, users in the rows and movies in the columns. Very sparse.



> str(dfm)
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:99072112] 378 1137 1755 1893 2359 3156 3423 4380 5103 6762 ...
..@ j : int [1:99072112] 0 0 0 0 0 0 0 0 0 0 ...
..@ Dim : int [1:2] 480189 17770
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:99072112] 4 5 4 1 4 5 4 5 3 3 ...
..@ factors : list()


probe_test is my test set, the set I'm trying to predict for. The actual probe test contains approximately 1.4 million rows but I am trying it on a subset first to optimize the time. It is being fed into my function.



> str(probe_test)
'data.frame': 6 obs. of 6 variables:
$ X : int 1 2 3 4 5 6
$ movie : int 1 1 1 1 1 1
$ user : int 1027056 1059319 1149588 1283744 1394012 1406595
$ Rating : int 3 3 4 3 5 4
$ Rating_Date: Factor w/ 1929 levels "2000-01-06","2000-01-08",..: 1901 1847 1911 1312 1917 1803
$ Indicator : int 1 1 1 1 1 1


U_lookup is the lookup I use to convert between user id and the line of the matrix a user is in since we lose user id's when they are converted to a sparse matrix.



> str(U_lookup)
'data.frame': 480189 obs. of 1 variable:
$ x: int 10 100000 1000004 1000027 1000033 1000035 1000038 1000051 1000053 1000057 ...


M_lookup is the lookup I use to convert between movie id and the column of a matrix a movie is in for similar reasons as above.



> str(M_lookup)
'data.frame': 17770 obs. of 1 variable:
$ x: int 1 10 100 1000 10000 10001 10002 10003 10004 10005 ...


knn_text contains the 100 nearest neighbors for all the lines of dfm



> str(knn_txt)
'data.frame': 480189 obs. of 200 variables:


Does anyone have suggestions on how I could improve performance within R? Does anyone have other language suggestions? I am slightly familiar with Python so I've been looking into that one, but if anyone has specific tips on redoing this in Python I would be grateful as I'm very inexperienced with it.









share|improve this question












share|improve this question




share|improve this question








edited Jul 20 at 7:18









Vogel612♦

20.9k345124




20.9k345124









asked Jul 18 at 2:12









user2355903

174




174











  • @user2355903 Can you supply code that generates or simulates your data, so we can run your code? Otherwise it is really hard to help you.
    – minem
    Jul 18 at 6:18










  • I made a new question here, if you were interested in looking at it: codereview.stackexchange.com/questions/199873/…
    – user2355903
    Jul 20 at 0:36






  • 1




    Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Vogel612♦
    Jul 20 at 7:18






  • 1




    So I should make a new question with example data?
    – user2355903
    Jul 20 at 13:35






  • 1




    Possible duplicate of R Function to Generate Predictions from Ratings and Save Results
    – AkselA
    Jul 25 at 15:03
















  • @user2355903 Can you supply code that generates or simulates your data, so we can run your code? Otherwise it is really hard to help you.
    – minem
    Jul 18 at 6:18










  • I made a new question here, if you were interested in looking at it: codereview.stackexchange.com/questions/199873/…
    – user2355903
    Jul 20 at 0:36






  • 1




    Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Vogel612♦
    Jul 20 at 7:18






  • 1




    So I should make a new question with example data?
    – user2355903
    Jul 20 at 13:35






  • 1




    Possible duplicate of R Function to Generate Predictions from Ratings and Save Results
    – AkselA
    Jul 25 at 15:03















@user2355903 Can you supply code that generates or simulates your data, so we can run your code? Otherwise it is really hard to help you.
– minem
Jul 18 at 6:18




@user2355903 Can you supply code that generates or simulates your data, so we can run your code? Otherwise it is really hard to help you.
– minem
Jul 18 at 6:18












I made a new question here, if you were interested in looking at it: codereview.stackexchange.com/questions/199873/…
– user2355903
Jul 20 at 0:36




I made a new question here, if you were interested in looking at it: codereview.stackexchange.com/questions/199873/…
– user2355903
Jul 20 at 0:36




1




1




Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Vogel612♦
Jul 20 at 7:18




Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Vogel612♦
Jul 20 at 7:18




1




1




So I should make a new question with example data?
– user2355903
Jul 20 at 13:35




So I should make a new question with example data?
– user2355903
Jul 20 at 13:35




1




1




Possible duplicate of R Function to Generate Predictions from Ratings and Save Results
– AkselA
Jul 25 at 15:03




Possible duplicate of R Function to Generate Predictions from Ratings and Save Results
– AkselA
Jul 25 at 15:03










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Without exact your data I could think of some improvements.
Trying to avoid redundant operations.



# order data.frame by users and movies
probe_test <- probe_test[with(probe_test, order(user, move)), ]
# initialize resulting column
probe_test$res <- rep(as.numeric(NA), nrow(probe_test))

knn_txt_red <- knn_txt[, 1:100] # reduce outside of the loop
for (user in unique(probe_test$user)) # for each unique user
u_row <- which(U_lookup == user)[1] # get your id
knn_match <- knn_txt_red[u_row, ]
knn_match1 <- as.numeric(unlist(knn_match))
userI <- probe_test$user == user
movies <- probe_test$movie[userI] #get all user movies
m_row <- which(M_lookup %in% movies) # get indexes
dfm_mov <- dfm[knn_match1, m_row] #select all cols of those movies for user
x <- colMeans(dfm_mov) # calculate mean for each row
probe_test[userI, 'res'] <- x # add the results to data.frame



As I do not have your data, there are probably/maybe some errors in code.



There are probably better ways to do this, but as I mentioned, it is hard to think of any without any example data.






share|improve this answer





















  • Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
    – user2355903
    Jul 18 at 13:38










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199718%2fr-function-to-generate-predictions-from-ratings%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Without exact your data I could think of some improvements.
Trying to avoid redundant operations.



# order data.frame by users and movies
probe_test <- probe_test[with(probe_test, order(user, move)), ]
# initialize resulting column
probe_test$res <- rep(as.numeric(NA), nrow(probe_test))

knn_txt_red <- knn_txt[, 1:100] # reduce outside of the loop
for (user in unique(probe_test$user)) # for each unique user
u_row <- which(U_lookup == user)[1] # get your id
knn_match <- knn_txt_red[u_row, ]
knn_match1 <- as.numeric(unlist(knn_match))
userI <- probe_test$user == user
movies <- probe_test$movie[userI] #get all user movies
m_row <- which(M_lookup %in% movies) # get indexes
dfm_mov <- dfm[knn_match1, m_row] #select all cols of those movies for user
x <- colMeans(dfm_mov) # calculate mean for each row
probe_test[userI, 'res'] <- x # add the results to data.frame



As I do not have your data, there are probably/maybe some errors in code.



There are probably better ways to do this, but as I mentioned, it is hard to think of any without any example data.






share|improve this answer





















  • Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
    – user2355903
    Jul 18 at 13:38














up vote
0
down vote













Without exact your data I could think of some improvements.
Trying to avoid redundant operations.



# order data.frame by users and movies
probe_test <- probe_test[with(probe_test, order(user, move)), ]
# initialize resulting column
probe_test$res <- rep(as.numeric(NA), nrow(probe_test))

knn_txt_red <- knn_txt[, 1:100] # reduce outside of the loop
for (user in unique(probe_test$user)) # for each unique user
u_row <- which(U_lookup == user)[1] # get your id
knn_match <- knn_txt_red[u_row, ]
knn_match1 <- as.numeric(unlist(knn_match))
userI <- probe_test$user == user
movies <- probe_test$movie[userI] #get all user movies
m_row <- which(M_lookup %in% movies) # get indexes
dfm_mov <- dfm[knn_match1, m_row] #select all cols of those movies for user
x <- colMeans(dfm_mov) # calculate mean for each row
probe_test[userI, 'res'] <- x # add the results to data.frame



As I do not have your data, there are probably/maybe some errors in code.



There are probably better ways to do this, but as I mentioned, it is hard to think of any without any example data.






share|improve this answer





















  • Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
    – user2355903
    Jul 18 at 13:38












up vote
0
down vote










up vote
0
down vote









Without exact your data I could think of some improvements.
Trying to avoid redundant operations.



# order data.frame by users and movies
probe_test <- probe_test[with(probe_test, order(user, move)), ]
# initialize resulting column
probe_test$res <- rep(as.numeric(NA), nrow(probe_test))

knn_txt_red <- knn_txt[, 1:100] # reduce outside of the loop
for (user in unique(probe_test$user)) # for each unique user
u_row <- which(U_lookup == user)[1] # get your id
knn_match <- knn_txt_red[u_row, ]
knn_match1 <- as.numeric(unlist(knn_match))
userI <- probe_test$user == user
movies <- probe_test$movie[userI] #get all user movies
m_row <- which(M_lookup %in% movies) # get indexes
dfm_mov <- dfm[knn_match1, m_row] #select all cols of those movies for user
x <- colMeans(dfm_mov) # calculate mean for each row
probe_test[userI, 'res'] <- x # add the results to data.frame



As I do not have your data, there are probably/maybe some errors in code.



There are probably better ways to do this, but as I mentioned, it is hard to think of any without any example data.






share|improve this answer













Without exact your data I could think of some improvements.
Trying to avoid redundant operations.



# order data.frame by users and movies
probe_test <- probe_test[with(probe_test, order(user, move)), ]
# initialize resulting column
probe_test$res <- rep(as.numeric(NA), nrow(probe_test))

knn_txt_red <- knn_txt[, 1:100] # reduce outside of the loop
for (user in unique(probe_test$user)) # for each unique user
u_row <- which(U_lookup == user)[1] # get your id
knn_match <- knn_txt_red[u_row, ]
knn_match1 <- as.numeric(unlist(knn_match))
userI <- probe_test$user == user
movies <- probe_test$movie[userI] #get all user movies
m_row <- which(M_lookup %in% movies) # get indexes
dfm_mov <- dfm[knn_match1, m_row] #select all cols of those movies for user
x <- colMeans(dfm_mov) # calculate mean for each row
probe_test[userI, 'res'] <- x # add the results to data.frame



As I do not have your data, there are probably/maybe some errors in code.



There are probably better ways to do this, but as I mentioned, it is hard to think of any without any example data.







share|improve this answer













share|improve this answer



share|improve this answer











answered Jul 18 at 6:57









minem

232139




232139











  • Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
    – user2355903
    Jul 18 at 13:38
















  • Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
    – user2355903
    Jul 18 at 13:38















Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
– user2355903
Jul 18 at 13:38




Thank you for the tip. I can see what you're saying here, if we can avoid re subsetting the sparse matrix multiple times that could be very helpful. I will try to post some example data later today and see if that is helpful to people.
– user2355903
Jul 18 at 13:38












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199718%2fr-function-to-generate-predictions-from-ratings%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods