Function to calculate Persistence Rate with optional group_by variable and logical arguments
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
0
down vote
favorite
The Function
"Persistence" is sometimes also referred to as "retention". It is defined as the number of units (ID's) in a given term/period that are also found in the subsequent term/period. So, if I have 10 customers in period 1, and 3 of those customers return in period 2, my persistence rate is 30%.
I have written a function that will either:
Calculate the persistence rate for each period's cohort of ID's
ifcalculate = TRUE
.Create an indicator variable on the original dataframe that
identifies whether the ID persisted (1) or not (0), if
calculate = FALSE
.
Furthermore, if overall = TRUE
when calculate = TRUE
, it will include the persistence rate over all of the terms.
The Arguments
Here is a brief description about each of the arguments:
df (REQUIRED): This is the dataframe argument, and a dataframe should be passed to this argument.
id (REQUIRED): This is the unique identification of the observational unit of interest. (Customer ID, Product ID, Student ID, etc.)
rank (REQUIRED): This is the numeric or ordered factor argument that defines the sequence of periods.
period (OPTIONAL): This is the "label" or more interpretative version of rank. Essentially just makes output pretty, if desired. (e.g., "October" is the period, 10 is the ranking number for October)
... (OPTIONAL): Variables togroup_by
in case a comparison of persistence rates across groups is desired.
overall (REQUIRED w/ DEFAULT): Logical variable to decide whether or not to include an "overall" persistence rate calculation.
calculate (REQUIRED w/ DEFAULT): Logical variable to decide whether to summarize the data into persistence rates, or to create an indicator variable denoting persistence.
Perceived Improvement Areas
Of course, any and all suggestions for ways to improve this function are greatly appreciated. I do, however, have some areas that I think could be improved, I'm just not sure how.
Grouping the Optionalperiod
Argument: In the section that describes what to do ifcalculate == TRUE
, I had to create anif
statement to group the variables differently depending on whether theperiod
argument was supplied. Before, there was only onegroup_by
argument, and if I explicitly called all of the arguments, the function would work great. But when I only called the first 3 required arguments, I would get an error. The current version works fine, but is there a better way to conditionally group optional variables?
Conditionaloverall
Argument: In order to calculate the overall persistence, it seems like I have to repeat a lot of code, which could be computationally expensive, and is a little less easy to read than one continuousdplyr
chain would be. Is there a more code-efficient way to calculate the overall rate?
What I've Already Tried
I tried to make things a little more efficient by creating the indicator variable 1st, whether or not calculate == TRUE
. The I just summarised the persistence_indicator
by group. But when I used system.time()
to compare performance before and after, my current function was more efficient in almost every combination of arguments. In retrospect, this makes sense. Why create that variable if I don't need it when calculate == TRUE
.
I also tried posting an earlier version of my function here on Code Review, just to be completely transparent. It didn't get much attention, which is probably fine since the function has changed so much. But I am still interested in general best practices for improving code, especially as it relates to conditionals.
Sample Data
dataFrame <- data.frame(id = as.character(c(1, 2, 3, 4, 1, 2, 3, 1, 2)),
period = c("A", "A", "A", "A", "B", "B", "B", "C", "C"),
rank = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
group = c(1, 2, 1, 2, 1, 2, 1, 1, 2),
stringsAsFactors = FALSE)
The Function Code
persistence <- function(df, id, rank, period, ..., overall = TRUE, calculate = TRUE)
stopifnot(!missing(df), !missing(id), !missing(rank))
period_missing <- missing(period)
enq_id <- enquo(id)
enq_rank <- enquo(rank)
enq_period <- enquo(period)
enq_group_var <- quos(...)
valid_rank_type <- is.numeric(rlang::eval_tidy(enq_rank, df))
Sample Function Call, Output, and sessionInfo()
library(dplyr)
persistence(df = dataFrame,
id = id,
rank = rank,
period = period,
group,
overall = TRUE,
calculate = TRUE)
# A tibble: 4 x 6
group rank period persistence_rate count overall
<dbl> <dbl> <chr> <dbl> <int> <dbl>
1 1 1 A 1.0 2 0.7142857
2 2 1 A 0.5 2 0.7142857
3 1 2 B 0.5 2 0.7142857
4 2 2 B 1.0 1 0.7142857
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.6
loaded via a namespace (and not attached):
[1] tidyselect_0.2.3 compiler_3.4.2 magrittr_1.5 assertthat_0.2.0 R6_2.2.2
[6] tools_3.4.2 glue_1.2.0 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.17
[11] pkgconfig_2.0.1 rlang_0.2.1 purrr_0.2.4 bindr_0.1.1
Final Note
The data I use interactively to test this function has about 15,000 rows, so when I mentioned performance above using system.time()
, it was with much more data that the sample data I have provided. The sample data works just fine.
r
add a comment |Â
up vote
0
down vote
favorite
The Function
"Persistence" is sometimes also referred to as "retention". It is defined as the number of units (ID's) in a given term/period that are also found in the subsequent term/period. So, if I have 10 customers in period 1, and 3 of those customers return in period 2, my persistence rate is 30%.
I have written a function that will either:
Calculate the persistence rate for each period's cohort of ID's
ifcalculate = TRUE
.Create an indicator variable on the original dataframe that
identifies whether the ID persisted (1) or not (0), if
calculate = FALSE
.
Furthermore, if overall = TRUE
when calculate = TRUE
, it will include the persistence rate over all of the terms.
The Arguments
Here is a brief description about each of the arguments:
df (REQUIRED): This is the dataframe argument, and a dataframe should be passed to this argument.
id (REQUIRED): This is the unique identification of the observational unit of interest. (Customer ID, Product ID, Student ID, etc.)
rank (REQUIRED): This is the numeric or ordered factor argument that defines the sequence of periods.
period (OPTIONAL): This is the "label" or more interpretative version of rank. Essentially just makes output pretty, if desired. (e.g., "October" is the period, 10 is the ranking number for October)
... (OPTIONAL): Variables togroup_by
in case a comparison of persistence rates across groups is desired.
overall (REQUIRED w/ DEFAULT): Logical variable to decide whether or not to include an "overall" persistence rate calculation.
calculate (REQUIRED w/ DEFAULT): Logical variable to decide whether to summarize the data into persistence rates, or to create an indicator variable denoting persistence.
Perceived Improvement Areas
Of course, any and all suggestions for ways to improve this function are greatly appreciated. I do, however, have some areas that I think could be improved, I'm just not sure how.
Grouping the Optionalperiod
Argument: In the section that describes what to do ifcalculate == TRUE
, I had to create anif
statement to group the variables differently depending on whether theperiod
argument was supplied. Before, there was only onegroup_by
argument, and if I explicitly called all of the arguments, the function would work great. But when I only called the first 3 required arguments, I would get an error. The current version works fine, but is there a better way to conditionally group optional variables?
Conditionaloverall
Argument: In order to calculate the overall persistence, it seems like I have to repeat a lot of code, which could be computationally expensive, and is a little less easy to read than one continuousdplyr
chain would be. Is there a more code-efficient way to calculate the overall rate?
What I've Already Tried
I tried to make things a little more efficient by creating the indicator variable 1st, whether or not calculate == TRUE
. The I just summarised the persistence_indicator
by group. But when I used system.time()
to compare performance before and after, my current function was more efficient in almost every combination of arguments. In retrospect, this makes sense. Why create that variable if I don't need it when calculate == TRUE
.
I also tried posting an earlier version of my function here on Code Review, just to be completely transparent. It didn't get much attention, which is probably fine since the function has changed so much. But I am still interested in general best practices for improving code, especially as it relates to conditionals.
Sample Data
dataFrame <- data.frame(id = as.character(c(1, 2, 3, 4, 1, 2, 3, 1, 2)),
period = c("A", "A", "A", "A", "B", "B", "B", "C", "C"),
rank = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
group = c(1, 2, 1, 2, 1, 2, 1, 1, 2),
stringsAsFactors = FALSE)
The Function Code
persistence <- function(df, id, rank, period, ..., overall = TRUE, calculate = TRUE)
stopifnot(!missing(df), !missing(id), !missing(rank))
period_missing <- missing(period)
enq_id <- enquo(id)
enq_rank <- enquo(rank)
enq_period <- enquo(period)
enq_group_var <- quos(...)
valid_rank_type <- is.numeric(rlang::eval_tidy(enq_rank, df))
Sample Function Call, Output, and sessionInfo()
library(dplyr)
persistence(df = dataFrame,
id = id,
rank = rank,
period = period,
group,
overall = TRUE,
calculate = TRUE)
# A tibble: 4 x 6
group rank period persistence_rate count overall
<dbl> <dbl> <chr> <dbl> <int> <dbl>
1 1 1 A 1.0 2 0.7142857
2 2 1 A 0.5 2 0.7142857
3 1 2 B 0.5 2 0.7142857
4 2 2 B 1.0 1 0.7142857
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.6
loaded via a namespace (and not attached):
[1] tidyselect_0.2.3 compiler_3.4.2 magrittr_1.5 assertthat_0.2.0 R6_2.2.2
[6] tools_3.4.2 glue_1.2.0 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.17
[11] pkgconfig_2.0.1 rlang_0.2.1 purrr_0.2.4 bindr_0.1.1
Final Note
The data I use interactively to test this function has about 15,000 rows, so when I mentioned performance above using system.time()
, it was with much more data that the sample data I have provided. The sample data works just fine.
r
What exactly you want to improve to this function?
â minem
Jul 16 at 11:23
I want to improve this function by using code that follows best practices. So, what are the best practices for incorporating optional arguments into adplyr
group_by()
chain? And what are the best practices for incorporating a conditional statement, as directed by an argument? Does my code follow those practices, does it get close? Is there a reason I shouldn't use the method I did?
â MillionC
Jul 16 at 15:19
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
The Function
"Persistence" is sometimes also referred to as "retention". It is defined as the number of units (ID's) in a given term/period that are also found in the subsequent term/period. So, if I have 10 customers in period 1, and 3 of those customers return in period 2, my persistence rate is 30%.
I have written a function that will either:
Calculate the persistence rate for each period's cohort of ID's
ifcalculate = TRUE
.Create an indicator variable on the original dataframe that
identifies whether the ID persisted (1) or not (0), if
calculate = FALSE
.
Furthermore, if overall = TRUE
when calculate = TRUE
, it will include the persistence rate over all of the terms.
The Arguments
Here is a brief description about each of the arguments:
df (REQUIRED): This is the dataframe argument, and a dataframe should be passed to this argument.
id (REQUIRED): This is the unique identification of the observational unit of interest. (Customer ID, Product ID, Student ID, etc.)
rank (REQUIRED): This is the numeric or ordered factor argument that defines the sequence of periods.
period (OPTIONAL): This is the "label" or more interpretative version of rank. Essentially just makes output pretty, if desired. (e.g., "October" is the period, 10 is the ranking number for October)
... (OPTIONAL): Variables togroup_by
in case a comparison of persistence rates across groups is desired.
overall (REQUIRED w/ DEFAULT): Logical variable to decide whether or not to include an "overall" persistence rate calculation.
calculate (REQUIRED w/ DEFAULT): Logical variable to decide whether to summarize the data into persistence rates, or to create an indicator variable denoting persistence.
Perceived Improvement Areas
Of course, any and all suggestions for ways to improve this function are greatly appreciated. I do, however, have some areas that I think could be improved, I'm just not sure how.
Grouping the Optionalperiod
Argument: In the section that describes what to do ifcalculate == TRUE
, I had to create anif
statement to group the variables differently depending on whether theperiod
argument was supplied. Before, there was only onegroup_by
argument, and if I explicitly called all of the arguments, the function would work great. But when I only called the first 3 required arguments, I would get an error. The current version works fine, but is there a better way to conditionally group optional variables?
Conditionaloverall
Argument: In order to calculate the overall persistence, it seems like I have to repeat a lot of code, which could be computationally expensive, and is a little less easy to read than one continuousdplyr
chain would be. Is there a more code-efficient way to calculate the overall rate?
What I've Already Tried
I tried to make things a little more efficient by creating the indicator variable 1st, whether or not calculate == TRUE
. The I just summarised the persistence_indicator
by group. But when I used system.time()
to compare performance before and after, my current function was more efficient in almost every combination of arguments. In retrospect, this makes sense. Why create that variable if I don't need it when calculate == TRUE
.
I also tried posting an earlier version of my function here on Code Review, just to be completely transparent. It didn't get much attention, which is probably fine since the function has changed so much. But I am still interested in general best practices for improving code, especially as it relates to conditionals.
Sample Data
dataFrame <- data.frame(id = as.character(c(1, 2, 3, 4, 1, 2, 3, 1, 2)),
period = c("A", "A", "A", "A", "B", "B", "B", "C", "C"),
rank = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
group = c(1, 2, 1, 2, 1, 2, 1, 1, 2),
stringsAsFactors = FALSE)
The Function Code
persistence <- function(df, id, rank, period, ..., overall = TRUE, calculate = TRUE)
stopifnot(!missing(df), !missing(id), !missing(rank))
period_missing <- missing(period)
enq_id <- enquo(id)
enq_rank <- enquo(rank)
enq_period <- enquo(period)
enq_group_var <- quos(...)
valid_rank_type <- is.numeric(rlang::eval_tidy(enq_rank, df))
Sample Function Call, Output, and sessionInfo()
library(dplyr)
persistence(df = dataFrame,
id = id,
rank = rank,
period = period,
group,
overall = TRUE,
calculate = TRUE)
# A tibble: 4 x 6
group rank period persistence_rate count overall
<dbl> <dbl> <chr> <dbl> <int> <dbl>
1 1 1 A 1.0 2 0.7142857
2 2 1 A 0.5 2 0.7142857
3 1 2 B 0.5 2 0.7142857
4 2 2 B 1.0 1 0.7142857
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.6
loaded via a namespace (and not attached):
[1] tidyselect_0.2.3 compiler_3.4.2 magrittr_1.5 assertthat_0.2.0 R6_2.2.2
[6] tools_3.4.2 glue_1.2.0 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.17
[11] pkgconfig_2.0.1 rlang_0.2.1 purrr_0.2.4 bindr_0.1.1
Final Note
The data I use interactively to test this function has about 15,000 rows, so when I mentioned performance above using system.time()
, it was with much more data that the sample data I have provided. The sample data works just fine.
r
The Function
"Persistence" is sometimes also referred to as "retention". It is defined as the number of units (ID's) in a given term/period that are also found in the subsequent term/period. So, if I have 10 customers in period 1, and 3 of those customers return in period 2, my persistence rate is 30%.
I have written a function that will either:
Calculate the persistence rate for each period's cohort of ID's
ifcalculate = TRUE
.Create an indicator variable on the original dataframe that
identifies whether the ID persisted (1) or not (0), if
calculate = FALSE
.
Furthermore, if overall = TRUE
when calculate = TRUE
, it will include the persistence rate over all of the terms.
The Arguments
Here is a brief description about each of the arguments:
df (REQUIRED): This is the dataframe argument, and a dataframe should be passed to this argument.
id (REQUIRED): This is the unique identification of the observational unit of interest. (Customer ID, Product ID, Student ID, etc.)
rank (REQUIRED): This is the numeric or ordered factor argument that defines the sequence of periods.
period (OPTIONAL): This is the "label" or more interpretative version of rank. Essentially just makes output pretty, if desired. (e.g., "October" is the period, 10 is the ranking number for October)
... (OPTIONAL): Variables togroup_by
in case a comparison of persistence rates across groups is desired.
overall (REQUIRED w/ DEFAULT): Logical variable to decide whether or not to include an "overall" persistence rate calculation.
calculate (REQUIRED w/ DEFAULT): Logical variable to decide whether to summarize the data into persistence rates, or to create an indicator variable denoting persistence.
Perceived Improvement Areas
Of course, any and all suggestions for ways to improve this function are greatly appreciated. I do, however, have some areas that I think could be improved, I'm just not sure how.
Grouping the Optionalperiod
Argument: In the section that describes what to do ifcalculate == TRUE
, I had to create anif
statement to group the variables differently depending on whether theperiod
argument was supplied. Before, there was only onegroup_by
argument, and if I explicitly called all of the arguments, the function would work great. But when I only called the first 3 required arguments, I would get an error. The current version works fine, but is there a better way to conditionally group optional variables?
Conditionaloverall
Argument: In order to calculate the overall persistence, it seems like I have to repeat a lot of code, which could be computationally expensive, and is a little less easy to read than one continuousdplyr
chain would be. Is there a more code-efficient way to calculate the overall rate?
What I've Already Tried
I tried to make things a little more efficient by creating the indicator variable 1st, whether or not calculate == TRUE
. The I just summarised the persistence_indicator
by group. But when I used system.time()
to compare performance before and after, my current function was more efficient in almost every combination of arguments. In retrospect, this makes sense. Why create that variable if I don't need it when calculate == TRUE
.
I also tried posting an earlier version of my function here on Code Review, just to be completely transparent. It didn't get much attention, which is probably fine since the function has changed so much. But I am still interested in general best practices for improving code, especially as it relates to conditionals.
Sample Data
dataFrame <- data.frame(id = as.character(c(1, 2, 3, 4, 1, 2, 3, 1, 2)),
period = c("A", "A", "A", "A", "B", "B", "B", "C", "C"),
rank = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
group = c(1, 2, 1, 2, 1, 2, 1, 1, 2),
stringsAsFactors = FALSE)
The Function Code
persistence <- function(df, id, rank, period, ..., overall = TRUE, calculate = TRUE)
stopifnot(!missing(df), !missing(id), !missing(rank))
period_missing <- missing(period)
enq_id <- enquo(id)
enq_rank <- enquo(rank)
enq_period <- enquo(period)
enq_group_var <- quos(...)
valid_rank_type <- is.numeric(rlang::eval_tidy(enq_rank, df))
Sample Function Call, Output, and sessionInfo()
library(dplyr)
persistence(df = dataFrame,
id = id,
rank = rank,
period = period,
group,
overall = TRUE,
calculate = TRUE)
# A tibble: 4 x 6
group rank period persistence_rate count overall
<dbl> <dbl> <chr> <dbl> <int> <dbl>
1 1 1 A 1.0 2 0.7142857
2 2 1 A 0.5 2 0.7142857
3 1 2 B 0.5 2 0.7142857
4 2 2 B 1.0 1 0.7142857
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.6
loaded via a namespace (and not attached):
[1] tidyselect_0.2.3 compiler_3.4.2 magrittr_1.5 assertthat_0.2.0 R6_2.2.2
[6] tools_3.4.2 glue_1.2.0 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.17
[11] pkgconfig_2.0.1 rlang_0.2.1 purrr_0.2.4 bindr_0.1.1
Final Note
The data I use interactively to test this function has about 15,000 rows, so when I mentioned performance above using system.time()
, it was with much more data that the sample data I have provided. The sample data works just fine.
r
asked Jul 13 at 18:33
MillionC
161
161
What exactly you want to improve to this function?
â minem
Jul 16 at 11:23
I want to improve this function by using code that follows best practices. So, what are the best practices for incorporating optional arguments into adplyr
group_by()
chain? And what are the best practices for incorporating a conditional statement, as directed by an argument? Does my code follow those practices, does it get close? Is there a reason I shouldn't use the method I did?
â MillionC
Jul 16 at 15:19
add a comment |Â
What exactly you want to improve to this function?
â minem
Jul 16 at 11:23
I want to improve this function by using code that follows best practices. So, what are the best practices for incorporating optional arguments into adplyr
group_by()
chain? And what are the best practices for incorporating a conditional statement, as directed by an argument? Does my code follow those practices, does it get close? Is there a reason I shouldn't use the method I did?
â MillionC
Jul 16 at 15:19
What exactly you want to improve to this function?
â minem
Jul 16 at 11:23
What exactly you want to improve to this function?
â minem
Jul 16 at 11:23
I want to improve this function by using code that follows best practices. So, what are the best practices for incorporating optional arguments into a
dplyr
group_by()
chain? And what are the best practices for incorporating a conditional statement, as directed by an argument? Does my code follow those practices, does it get close? Is there a reason I shouldn't use the method I did?â MillionC
Jul 16 at 15:19
I want to improve this function by using code that follows best practices. So, what are the best practices for incorporating optional arguments into a
dplyr
group_by()
chain? And what are the best practices for incorporating a conditional statement, as directed by an argument? Does my code follow those practices, does it get close? Is there a reason I shouldn't use the method I did?â MillionC
Jul 16 at 15:19
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f198449%2ffunction-to-calculate-persistence-rate-with-optional-group-by-variable-and-logic%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What exactly you want to improve to this function?
â minem
Jul 16 at 11:23
I want to improve this function by using code that follows best practices. So, what are the best practices for incorporating optional arguments into a
dplyr
group_by()
chain? And what are the best practices for incorporating a conditional statement, as directed by an argument? Does my code follow those practices, does it get close? Is there a reason I shouldn't use the method I did?â MillionC
Jul 16 at 15:19