tidymodels/recipes

Figure out why `step_dummy()` is slow with many dummy variables

EmilHvitfeldt opened this issue · 1 comments

Originally posted in #1253. This might just be a artifact of us handling the predictors one by one, but it is a stark difference

library(tidymodels)

make_factor <- function(x) {
  factor(sample(c("A", "B"), 100, TRUE), levels = c("A", "B"))
} 

x <- map(1:1001, make_factor) %>%
  set_names(c("outcome", paste0("x", 1:1000))) %>%
  as_tibble()

rec <- recipe(outcome ~ ., data = x) %>%
  step_dummy(all_nominal_predictors())

lr_mod <- logistic_reg()

lr_wf <- workflow() %>%
  add_model(lr_mod) %>%
  add_recipe(rec)

tictoc::tic("with recipes")
tmp <- lr_wf %>% fit(data = x)
tictoc::toc()
#> with recipes: 5.496 sec elapsed

tictoc::tic("without recipes")
tmp <- lr_mod %>% fit(outcome ~ ., data = x)
tictoc::toc()
#> without recipes: 0.437 sec elapsed

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.