Find the minimum value that data could have had before it was rounded

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

I have this function in R:

Minimum <- function(data) 
 answer <- numeric(length(data))

 diference <- c(0, diff(data, lag = 1, differences = 1)) #Padded initially =0

 answer[1]=data[1]
 for (i in 2:length(diference)) 
 if (diference[i]==0) 
 answer[i]=answer[i-1]
 else 
 answer[i]=data[i]-diference[i]/2
 
 
 return(answer)

Its purpose is to find the minimum value which "data" could had before it was rounded.

The minimum possible value is the average of the values which "data" had at the last change of value in "data"

This code works, but since for loops are inefficient in R, it is advised to vectorize the function.

The problem is that the "answer" vector depends on the former values in "answer", so I cannot use a lambda function.

edited Apr 16 at 16:51

Sam Onela

5,78461544

asked Apr 16 at 16:03

yoxota

112

I don't understand what this has to do with rounding, or the meaning of The minimum possible value is the average of the values which "data" had at the last change of value in "data". Could you elaborate? Also diference[i]==0 will be subject to floating point errors, so not reliable if you are dealing with numeric (non integer) vectors.
â€“Â flodel
Apr 16 at 23:01

For example, the numbers [1.14 1.23 1.28 1.35] could be rounded to [1.1 1.2 1.3 1.3]. The minimum possible value would be [? 1.15 1.25 1.25], otherwise they would not had been rounded that way. Even worse, on this example I know the module/precision of the rounding, but in real life is unknown how the values were rounded. They could had been rounded to multiples of pi, or who knows what number.
â€“Â yoxota
Apr 17 at 14:26

Thanks for the explanation. If I understand correctly, it is assuming your input data is increasing. If so, you might want to check that assumption in your function, something like stopifnot(all(diff(data) >= 0)).
â€“Â flodel
Apr 17 at 23:53

I gave it some thought. Should you not look for the smallest value in diff(data) that is not exactly zero and make that your (estimated) rounded precision for all values? Minimum <- function(data) d <- diff(data); p <- min(d[d > 0]); data - p/2 . It's all vectorized, faster, and provides a better (larger) minimum bound on your pre-rounded data.
â€“Â flodel
Apr 18 at 0:01

@ flodel Sorry for the delay. Looking for the smaller difference is complicated, because small values are noisy, so the smaller values correspond to noise around difference=0. It looks like the rounding scales up with the value.
â€“Â yoxota
Apr 25 at 13:57

add a commentÂ |Â

up vote
1
down vote

favorite

I have this function in R:

Minimum <- function(data) 
 answer <- numeric(length(data))

 diference <- c(0, diff(data, lag = 1, differences = 1)) #Padded initially =0

 answer[1]=data[1]
 for (i in 2:length(diference)) 
 if (diference[i]==0) 
 answer[i]=answer[i-1]
 else 
 answer[i]=data[i]-diference[i]/2
 
 
 return(answer)

Its purpose is to find the minimum value which "data" could had before it was rounded.

The minimum possible value is the average of the values which "data" had at the last change of value in "data"

This code works, but since for loops are inefficient in R, it is advised to vectorize the function.

The problem is that the "answer" vector depends on the former values in "answer", so I cannot use a lambda function.

edited Apr 16 at 16:51

Sam Onela

5,78461544

asked Apr 16 at 16:03

yoxota

112

I don't understand what this has to do with rounding, or the meaning of The minimum possible value is the average of the values which "data" had at the last change of value in "data". Could you elaborate? Also diference[i]==0 will be subject to floating point errors, so not reliable if you are dealing with numeric (non integer) vectors.
â€“Â flodel
Apr 16 at 23:01

For example, the numbers [1.14 1.23 1.28 1.35] could be rounded to [1.1 1.2 1.3 1.3]. The minimum possible value would be [? 1.15 1.25 1.25], otherwise they would not had been rounded that way. Even worse, on this example I know the module/precision of the rounding, but in real life is unknown how the values were rounded. They could had been rounded to multiples of pi, or who knows what number.
â€“Â yoxota
Apr 17 at 14:26

Thanks for the explanation. If I understand correctly, it is assuming your input data is increasing. If so, you might want to check that assumption in your function, something like stopifnot(all(diff(data) >= 0)).
â€“Â flodel
Apr 17 at 23:53

I gave it some thought. Should you not look for the smallest value in diff(data) that is not exactly zero and make that your (estimated) rounded precision for all values? Minimum <- function(data) d <- diff(data); p <- min(d[d > 0]); data - p/2 . It's all vectorized, faster, and provides a better (larger) minimum bound on your pre-rounded data.
â€“Â flodel
Apr 18 at 0:01

@ flodel Sorry for the delay. Looking for the smaller difference is complicated, because small values are noisy, so the smaller values correspond to noise around difference=0. It looks like the rounding scales up with the value.
â€“Â yoxota
Apr 25 at 13:57

add a commentÂ |Â

up vote
1
down vote

favorite

I have this function in R:

Minimum <- function(data) 
 answer <- numeric(length(data))

 diference <- c(0, diff(data, lag = 1, differences = 1)) #Padded initially =0

 answer[1]=data[1]
 for (i in 2:length(diference)) 
 if (diference[i]==0) 
 answer[i]=answer[i-1]
 else 
 answer[i]=data[i]-diference[i]/2
 
 
 return(answer)

Its purpose is to find the minimum value which "data" could had before it was rounded.

The minimum possible value is the average of the values which "data" had at the last change of value in "data"

This code works, but since for loops are inefficient in R, it is advised to vectorize the function.

The problem is that the "answer" vector depends on the former values in "answer", so I cannot use a lambda function.

edited Apr 16 at 16:51

Sam Onela

5,78461544

asked Apr 16 at 16:03

yoxota

112

I have this function in R:

Minimum <- function(data) 
 answer <- numeric(length(data))

 diference <- c(0, diff(data, lag = 1, differences = 1)) #Padded initially =0

 answer[1]=data[1]
 for (i in 2:length(diference)) 
 if (diference[i]==0) 
 answer[i]=answer[i-1]
 else 
 answer[i]=data[i]-diference[i]/2
 
 
 return(answer)

Its purpose is to find the minimum value which "data" could had before it was rounded.

The minimum possible value is the average of the values which "data" had at the last change of value in "data"

This code works, but since for loops are inefficient in R, it is advised to vectorize the function.

The problem is that the "answer" vector depends on the former values in "answer", so I cannot use a lambda function.

edited Apr 16 at 16:51

Sam Onela

5,78461544

asked Apr 16 at 16:03

yoxota

112

edited Apr 16 at 16:51

Sam Onela

5,78461544

edited Apr 16 at 16:51

Sam Onela

5,78461544

edited Apr 16 at 16:51

Sam Onela

5,78461544

asked Apr 16 at 16:03

yoxota

112

asked Apr 16 at 16:03

yoxota

112

asked Apr 16 at 16:03

yoxota

112

I don't understand what this has to do with rounding, or the meaning of The minimum possible value is the average of the values which "data" had at the last change of value in "data". Could you elaborate? Also diference[i]==0 will be subject to floating point errors, so not reliable if you are dealing with numeric (non integer) vectors.
â€“Â flodel
Apr 16 at 23:01

For example, the numbers [1.14 1.23 1.28 1.35] could be rounded to [1.1 1.2 1.3 1.3]. The minimum possible value would be [? 1.15 1.25 1.25], otherwise they would not had been rounded that way. Even worse, on this example I know the module/precision of the rounding, but in real life is unknown how the values were rounded. They could had been rounded to multiples of pi, or who knows what number.
â€“Â yoxota
Apr 17 at 14:26

Thanks for the explanation. If I understand correctly, it is assuming your input data is increasing. If so, you might want to check that assumption in your function, something like stopifnot(all(diff(data) >= 0)).
â€“Â flodel
Apr 17 at 23:53

I gave it some thought. Should you not look for the smallest value in diff(data) that is not exactly zero and make that your (estimated) rounded precision for all values? Minimum <- function(data) d <- diff(data); p <- min(d[d > 0]); data - p/2 . It's all vectorized, faster, and provides a better (larger) minimum bound on your pre-rounded data.
â€“Â flodel
Apr 18 at 0:01

@ flodel Sorry for the delay. Looking for the smaller difference is complicated, because small values are noisy, so the smaller values correspond to noise around difference=0. It looks like the rounding scales up with the value.
â€“Â yoxota
Apr 25 at 13:57

add a commentÂ |Â

I don't understand what this has to do with rounding, or the meaning of The minimum possible value is the average of the values which "data" had at the last change of value in "data". Could you elaborate? Also diference[i]==0 will be subject to floating point errors, so not reliable if you are dealing with numeric (non integer) vectors.
â€“Â flodel
Apr 16 at 23:01

For example, the numbers [1.14 1.23 1.28 1.35] could be rounded to [1.1 1.2 1.3 1.3]. The minimum possible value would be [? 1.15 1.25 1.25], otherwise they would not had been rounded that way. Even worse, on this example I know the module/precision of the rounding, but in real life is unknown how the values were rounded. They could had been rounded to multiples of pi, or who knows what number.
â€“Â yoxota
Apr 17 at 14:26

Thanks for the explanation. If I understand correctly, it is assuming your input data is increasing. If so, you might want to check that assumption in your function, something like stopifnot(all(diff(data) >= 0)).
â€“Â flodel
Apr 17 at 23:53

I gave it some thought. Should you not look for the smallest value in diff(data) that is not exactly zero and make that your (estimated) rounded precision for all values? Minimum <- function(data) d <- diff(data); p <- min(d[d > 0]); data - p/2 . It's all vectorized, faster, and provides a better (larger) minimum bound on your pre-rounded data.
â€“Â flodel
Apr 18 at 0:01

@ flodel Sorry for the delay. Looking for the smaller difference is complicated, because small values are noisy, so the smaller values correspond to noise around difference=0. It looks like the rounding scales up with the value.
â€“Â yoxota
Apr 25 at 13:57

I don't understand what this has to do with rounding, or the meaning of The minimum possible value is the average of the values which "data" had at the last change of value in "data". Could you elaborate? Also diference[i]==0 will be subject to floating point errors, so not reliable if you are dealing with numeric (non integer) vectors.
â€“Â flodel
Apr 16 at 23:01

For example, the numbers [1.14 1.23 1.28 1.35] could be rounded to [1.1 1.2 1.3 1.3]. The minimum possible value would be [? 1.15 1.25 1.25], otherwise they would not had been rounded that way. Even worse, on this example I know the module/precision of the rounding, but in real life is unknown how the values were rounded. They could had been rounded to multiples of pi, or who knows what number.
â€“Â yoxota
Apr 17 at 14:26

Thanks for the explanation. If I understand correctly, it is assuming your input data is increasing. If so, you might want to check that assumption in your function, something like stopifnot(all(diff(data) >= 0)).
â€“Â flodel
Apr 17 at 23:53

I gave it some thought. Should you not look for the smallest value in diff(data) that is not exactly zero and make that your (estimated) rounded precision for all values? Minimum <- function(data) d <- diff(data); p <- min(d[d > 0]); data - p/2 . It's all vectorized, faster, and provides a better (larger) minimum bound on your pre-rounded data.
â€“Â flodel
Apr 18 at 0:01

@ flodel Sorry for the delay. Looking for the smaller difference is complicated, because small values are noisy, so the smaller values correspond to noise around difference=0. It looks like the rounding scales up with the value.
â€“Â yoxota
Apr 25 at 13:57

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
4
down vote

accepted

You can firstly change by difference != 0 and then use na.locf to replace NAs by last available value recursively.

minimum_new <- function(data) 
 answer <- rep(NA, length(data))
 difference <- c(0, diff(data, lag = 1, differences = 1)) / 2
 answer[1] <- data[1]
 answer[difference != 0] <- data[difference != 0] - difference[difference != 0]
 answer <- zoo::na.locf(answer, na.rm = FALSE)
 answer

This version is faster for me by at least 2 times.

> data <- sample(10, 10000, replace = TRUE)
> check <- function(values) all(sapply(values[-1], function(x) identical(values[[1]], x)))
> bench <- microbenchmark::microbenchmark(loop = Minimum(data), vectorised = minimum_new(data), check=check)
Unit: microseconds
 expr min lq mean median uq max neval cld
 loop 1401.959 1415.552 1665.816 1457.274 1586.407 4620.835 100 b
 vectorised 742.325 758.183 1111.202 796.507 1383.268 2587.940 100 a

With check it's also checks the equality of output.

edited Apr 17 at 15:32

answered Apr 16 at 17:53

m0nhawk

354210

I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM
â€“Â yoxota
Apr 17 at 14:20

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192211%2ffind-the-minimum-value-that-data-could-have-had-before-it-was-rounded%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
4
down vote

accepted

You can firstly change by difference != 0 and then use na.locf to replace NAs by last available value recursively.

minimum_new <- function(data) 
 answer <- rep(NA, length(data))
 difference <- c(0, diff(data, lag = 1, differences = 1)) / 2
 answer[1] <- data[1]
 answer[difference != 0] <- data[difference != 0] - difference[difference != 0]
 answer <- zoo::na.locf(answer, na.rm = FALSE)
 answer

This version is faster for me by at least 2 times.

> data <- sample(10, 10000, replace = TRUE)
> check <- function(values) all(sapply(values[-1], function(x) identical(values[[1]], x)))
> bench <- microbenchmark::microbenchmark(loop = Minimum(data), vectorised = minimum_new(data), check=check)
Unit: microseconds
 expr min lq mean median uq max neval cld
 loop 1401.959 1415.552 1665.816 1457.274 1586.407 4620.835 100 b
 vectorised 742.325 758.183 1111.202 796.507 1383.268 2587.940 100 a

With check it's also checks the equality of output.

edited Apr 17 at 15:32

answered Apr 16 at 17:53

m0nhawk

354210

I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM
â€“Â yoxota
Apr 17 at 14:20

add a commentÂ |Â

up vote
4
down vote

accepted

You can firstly change by difference != 0 and then use na.locf to replace NAs by last available value recursively.

minimum_new <- function(data) 
 answer <- rep(NA, length(data))
 difference <- c(0, diff(data, lag = 1, differences = 1)) / 2
 answer[1] <- data[1]
 answer[difference != 0] <- data[difference != 0] - difference[difference != 0]
 answer <- zoo::na.locf(answer, na.rm = FALSE)
 answer

This version is faster for me by at least 2 times.

> data <- sample(10, 10000, replace = TRUE)
> check <- function(values) all(sapply(values[-1], function(x) identical(values[[1]], x)))
> bench <- microbenchmark::microbenchmark(loop = Minimum(data), vectorised = minimum_new(data), check=check)
Unit: microseconds
 expr min lq mean median uq max neval cld
 loop 1401.959 1415.552 1665.816 1457.274 1586.407 4620.835 100 b
 vectorised 742.325 758.183 1111.202 796.507 1383.268 2587.940 100 a

With check it's also checks the equality of output.

edited Apr 17 at 15:32

answered Apr 16 at 17:53

m0nhawk

354210

I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM
â€“Â yoxota
Apr 17 at 14:20

add a commentÂ |Â

up vote
4
down vote

accepted

You can firstly change by difference != 0 and then use na.locf to replace NAs by last available value recursively.

minimum_new <- function(data) 
 answer <- rep(NA, length(data))
 difference <- c(0, diff(data, lag = 1, differences = 1)) / 2
 answer[1] <- data[1]
 answer[difference != 0] <- data[difference != 0] - difference[difference != 0]
 answer <- zoo::na.locf(answer, na.rm = FALSE)
 answer

This version is faster for me by at least 2 times.

> data <- sample(10, 10000, replace = TRUE)
> check <- function(values) all(sapply(values[-1], function(x) identical(values[[1]], x)))
> bench <- microbenchmark::microbenchmark(loop = Minimum(data), vectorised = minimum_new(data), check=check)
Unit: microseconds
 expr min lq mean median uq max neval cld
 loop 1401.959 1415.552 1665.816 1457.274 1586.407 4620.835 100 b
 vectorised 742.325 758.183 1111.202 796.507 1383.268 2587.940 100 a

With check it's also checks the equality of output.

edited Apr 17 at 15:32

answered Apr 16 at 17:53

m0nhawk

354210

You can firstly change by difference != 0 and then use na.locf to replace NAs by last available value recursively.

minimum_new <- function(data) 
 answer <- rep(NA, length(data))
 difference <- c(0, diff(data, lag = 1, differences = 1)) / 2
 answer[1] <- data[1]
 answer[difference != 0] <- data[difference != 0] - difference[difference != 0]
 answer <- zoo::na.locf(answer, na.rm = FALSE)
 answer

This version is faster for me by at least 2 times.

> data <- sample(10, 10000, replace = TRUE)
> check <- function(values) all(sapply(values[-1], function(x) identical(values[[1]], x)))
> bench <- microbenchmark::microbenchmark(loop = Minimum(data), vectorised = minimum_new(data), check=check)
Unit: microseconds
 expr min lq mean median uq max neval cld
 loop 1401.959 1415.552 1665.816 1457.274 1586.407 4620.835 100 b
 vectorised 742.325 758.183 1111.202 796.507 1383.268 2587.940 100 a

With check it's also checks the equality of output.

edited Apr 17 at 15:32

answered Apr 16 at 17:53

m0nhawk

354210

edited Apr 17 at 15:32

answered Apr 16 at 17:53

m0nhawk

354210

answered Apr 16 at 17:53

m0nhawk

354210

answered Apr 16 at 17:53

m0nhawk

354210

I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM
â€“Â yoxota
Apr 17 at 14:20

add a commentÂ |Â

I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM
â€“Â yoxota
Apr 17 at 14:20

I would never had found na.locf function by myself. I wonder how I could had found it. Thank you. have a reward youtube.com/watch?v=fD7ji3YOwcM
â€“Â yoxota
Apr 17 at 14:20

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr