TysonStanley/tidyfast

dt_fill raises recycle error all of a sudden

gernophil opened this issue ยท 8 comments

I have data wrangling script, where I just want to fill all NAs with the value above it. I used to do it this way:

library(data.table)
library(tidyfast)

col1 <- c("A", "B", "C", "D")
col2 <- c("X", NA, "Z", NA)

dt <- data.table(col1,
                 col2)

dt[, col2 := dt_fill(dt, col2, .direction = 'down')]

And this always worked, but now I get the error:

Error in `[.data.table`(dt, , `:=`(col2, dt_fill(dt, col2, .direction = "down"))) : 
  Supplied 2 items to be assigned to 4 items of column 'col2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

going back to 0.2.1 fixes the issue. Would be great, if there was a better solution to this.

Per git bisect, culprit is 55da55e

Thanks for reporting, will take a look this weekend

Thanks for posting. The update to dt_fill() makes it possible for tidyfast to leverage better coding with data.table. The code that should work for your situation:

library(data.table)
library(tidyfast)

col1 <- c("A", "B", "C", "D", "E")
col2 <- c("X", NA, "X", NA, "Z")
col3 <- c("1", "1", "1", "2", "2")

dt <- data.table(col1, col2, col3)

# dt[, col2 := dt_fill(dt, col2, .direction = 'down')] # errors

# simple use
dt1 = dt_fill(dt, col2, .direction = 'down')
dt1[]

# by group (col3)
dt2 = dt_fill(dt, col2, id = col3, .direction = 'updown')
dt2[]

Simplifies the code too.

Thanks for clarifying and thank for your work. I have two questions for this though:

  1. It seems that the new dt_fill() does not allow in place modification anymore. Is that correct?
  2. I use dt_fill() the way it worked before in some scripts. Now I need to change all of those and also tell people that use it, to change this (I know this could be easily done using git, bit not everyone uses that). Would it not have been better to make a new function instead of changing the already established dt_fill()?

Thanks.

  1. You can set immutable = FALSE to mutate in place.
  2. The goal of dt_fill() was to match tidyr::fill() while taking advantage of data.table. This change was to better meet that goal.

Beyond the annoyance of changing code (sorry about that!) is there another reason to keep the previous behavior? It's possible to keep the previous behavior (potentially with a new argument or a different function).

Thanks :).

I guess there is not or I cannot really make an educated decision here, since I only use it for that case :). Maybe there is a way to announce those changes since I just needed to quickly rerun a script to create an updated plot and it didn't work anymore, which did cost some time to narrow it down.

It works again now and it's a quick fix. Thanks again for the package :).

Yeah I can work on making any breaking changes more obvious. Thanks for highlighting how you were using the function. Didn't realize it was being used that way so this is helpful for me ๐Ÿ˜€