kevinushey/RcppRoll

Argument to pad with NAs when n > 1

datalove opened this issue · 11 comments

Hi Kevin, just wondering if you might consider including an option to pad the returned vectors/matrices with NAs when n > 1?

I'd love to be able to do something like this

data.frame(a = 1:10, b = roll_sum(1:10, 3, padNA = TRUE))

instead of this

data.frame(a = 1:10, b = c(rep(NA,2),roll_sum(1:10, 3))).

Of course data.frame(a = 1:x, b = roll_sum(1:x, n = 3)) throws an error because length of roll_sum(x,n) for n > 1 is shorter than length(1:x).

A good idea -- I might prefer e.g. pad.left or pad.right in case we want to how padding occurs. Or maybe have pad be one of "left", "right", or NULL. I'll think about this. Thanks for the feature suggestion!

Hi Kevin, I suspect I'm missing something, but if roll_abc(x) rolls forward (1:length(x)) and there is no option to roll backward (length(x):1), then isn't left padding of NAs all we'd ever need?

If indeed there was an option to roll backward, if pad = TRUE then it could automatically pad the end of the vector with NAs.

Maybe, but I think there could be users who would still prefer different 'alignment', so that e.g. they prefer to get:

> data.frame(a = 1:5, b = c(NA, NA, roll_sum(1:5, 3)))
  a  b
1 1 NA
2 2 NA
3 3  6
4 4  9
5 5 12

while sometimes, someone might want

> data.frame(a = 1:5, b = c(roll_sum(1:5, 3), NA, NA))
  a  b
1 1  6
2 2  9
3 3 12
4 4 NA
5 5 NA

Or is that a rather awkward proposition?

It looks like zoo::rollapply() handles what you're proposing - that could be a good indicator that others would find it useful, though I suspect that left padding zeros may be a good default.

I guess I've lived a sheltered life when it comes to rolling operations :)

Let me second the request for this feature. It would be especially useful for users of dplyr, functions used in which can only return either 1 value of n values. We can't use roll_abc() with dplyr because it returns vectors that are lacking the appropriate NAs. So, for now, I use zoo:rollapply as discussed above. rollapply is nice, and I think widely used in the R finance community, but I suspect that roll_abc would be faster. I think that several of the options/approaches of rollapply are worth looking at, especially width, fill and align.

I was also mostly hoping to use RcppRoll with dplyr.
On 06/07/2014 7:35 PM, "davidkane9" notifications@github.com wrote:

Let me second the request for this feature. It would be especially useful
for users of dplyr, functions used in which can only return either 1 value
of n values. We can't use roll_abc() with dplyr because it returns vectors
that are lacking the appropriate NAs. So, for now, I use zoo:rollapply as
discussed above. rollapply is nice, and I think widely used in the R
finance community, but I suspect that roll_abc would be faster. I think
that several of the options/approaches of rollapply are worth looking at,
especially width, fill and align.


Reply to this email directly or view it on GitHub
#1 (comment).

Hi guys,

In the devel branch, I'm doing a big re-write. The 'main' exported functions now have the align and fill arguments, as from zoo::rollapply. You can try:

devtools::install_github("kevinushey/RcppRoll", ref = "devel")
library("RcppRoll")
roll_mean(1:5, 3L, fill = NA, align = "left")

to get a feel for it.

I will merge to master after I've considered a few more things:

  1. Re-introducing the by argument,
  2. Supporting partial, and
  3. Upgrading the rollit and rollit_raw functions to the new interface.

Not sure whether this difference to zoo is intended (I'm personally fine with it, as I use fill=NA a lot anyway):

x <- 2:5
rollapplyr(x, 2, mean)
[1] 2.5 3.5 4.5
roll_meanr(x, 2)
[1]  NA 2.5 3.5 4.5

That is intended -- I found it strange that rollapplyr does not automatically set a fill (thereby making its behaviour identical to rollapply by default). I thought NA was the most sensible default here.

I've started by implementing a simple version of fill -- it can currently be a vector of length 1 or 3, specifying fills for the 'left padding', 'middle padding', and 'right padding' respectively.

Of course, there could be cases where someone wants to supply a vector to pad left with, e.g.

zoo::rollapplyr(1:5, 3, mean, fill = list(c(0, 1), NA, NA))

but I haven't implemented anything that general yet.

fill has now been implemented in a way that conforms with the behaviour of zoo's rollapply function, with the caveat that I still maintain the alternative default behaviour for the roll_r and roll_l functions.