Slowness with record types (possibly my misuse of dplyr across)

Question

Slowness with record types (possibly my misuse of dplyr across)

Closed this issue 3 years ago · 1 comments

I think I'm using across poorly as the example below is crazily slow. Here I'm using clock but I'm pretty sure that is not the bottleneck:

library(incidence2)
library(clock)
library(microbenchmark)

dat <- covidregionaldataUK

# default uses data.table
default <- function() {
  incidence(dat, date_index = date, groups = region, counts = ends_with("new"))
}

# here clock is just used as an example and is not the bottle-neck
record <- function() {
  build_incidence(
    dat,
    date_index = date,
    groups = region,
    counts = ends_with("new"),
    FUN = function(x) calendar_narrow(as_year_month_day(x), precision = "day")
  )
}

microbenchmark(default(), record(), times = 10)
#> Unit: milliseconds
#>       expr         min          lq        mean      median          uq
#>  default()    5.053506    5.303756    7.184131    5.825458    6.297723
#>   record() 4614.132282 4683.771452 4741.717130 4751.499357 4782.525466
#>         max neval
#>    18.83551    10
#>  4835.74966    10

^{Created on 2021-06-28 by the reprex package (v2.0.0)}

Answer 1 · 2021-06-29T12:15:46.000Z

Closed by 85a17a6. Need to use lambda functions for better performance (see tidyverse/dplyr#5909)