Figure out why `step_dummy()` is slow with many dummy variables
EmilHvitfeldt opened this issue · 1 comments
EmilHvitfeldt commented
Originally posted in #1253. This might just be a artifact of us handling the predictors one by one, but it is a stark difference
library(tidymodels)
make_factor <- function(x) {
factor(sample(c("A", "B"), 100, TRUE), levels = c("A", "B"))
}
x <- map(1:1001, make_factor) %>%
set_names(c("outcome", paste0("x", 1:1000))) %>%
as_tibble()
rec <- recipe(outcome ~ ., data = x) %>%
step_dummy(all_nominal_predictors())
lr_mod <- logistic_reg()
lr_wf <- workflow() %>%
add_model(lr_mod) %>%
add_recipe(rec)
tictoc::tic("with recipes")
tmp <- lr_wf %>% fit(data = x)
tictoc::toc()
#> with recipes: 5.496 sec elapsed
tictoc::tic("without recipes")
tmp <- lr_mod %>% fit(outcome ~ ., data = x)
tictoc::toc()
#> without recipes: 0.437 sec elapsed
github-actions commented
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.