Slowness with record types (possibly my misuse of dplyr across)
Closed this issue · 1 comments
TimTaylor commented
I think I'm using across
poorly as the example below is crazily slow. Here I'm using clock but I'm pretty sure that is not the bottleneck:
library(incidence2)
library(clock)
library(microbenchmark)
dat <- covidregionaldataUK
# default uses data.table
default <- function() {
incidence(dat, date_index = date, groups = region, counts = ends_with("new"))
}
# here clock is just used as an example and is not the bottle-neck
record <- function() {
build_incidence(
dat,
date_index = date,
groups = region,
counts = ends_with("new"),
FUN = function(x) calendar_narrow(as_year_month_day(x), precision = "day")
)
}
microbenchmark(default(), record(), times = 10)
#> Unit: milliseconds
#> expr min lq mean median uq
#> default() 5.053506 5.303756 7.184131 5.825458 6.297723
#> record() 4614.132282 4683.771452 4741.717130 4751.499357 4782.525466
#> max neval
#> 18.83551 10
#> 4835.74966 10
Created on 2021-06-28 by the reprex package (v2.0.0)
TimTaylor commented
Closed by 85a17a6. Need to use lambda functions for better performance (see tidyverse/dplyr#5909)