`tune_race_anova()` not handling very similar metrics well?
juliasilge opened this issue · 2 comments
juliasilge commented
This SO question seems to highlight a situation where finetune isn't handling an edge case very well:
library(tidymodels)
library(finetune)
#> Registered S3 method overwritten by 'finetune':
#> method from
#> obj_sum.tune_race tune
data(cells, package = "modeldata")
set.seed(31)
split <- cells %>%
select(-case) %>%
initial_split(prop = 0.8)
set.seed(234)
folds <- training(split) %>% vfold_cv(v = 3)
folds
#> # 3-fold cross-validation
#> # A tibble: 3 × 2
#> splits id
#> <list> <chr>
#> 1 <split [1076/539]> Fold1
#> 2 <split [1077/538]> Fold2
#> 3 <split [1077/538]> Fold3
xgb_spec <- boost_tree(mode = "classification", trees = tune())
set.seed(234)
workflow(class ~ ., xgb_spec) %>%
tune_grid(
resamples = folds,
grid = 5
)
#> # Tuning results
#> # 3-fold cross-validation
#> # A tibble: 3 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [1076/539]> Fold1 <tibble [10 × 5]> <tibble [0 × 3]>
#> 2 <split [1077/538]> Fold2 <tibble [10 × 5]> <tibble [0 × 3]>
#> 3 <split [1077/538]> Fold3 <tibble [10 × 5]> <tibble [0 × 3]>
set.seed(345)
workflow(class ~ ., xgb_spec) %>%
tune_race_anova(
resamples = folds,
grid = 5
)
#> Error in `mutate()`:
#> ! Problem while computing `col = purrr::map(splits, ~NULL)`.
#> x `col` must be size 1, not 0.
Created on 2022-02-22 by the reprex package (v2.0.1)
The error is coming from tune:::pulley()
and I think maybe it is removing all the candidates at some step? Because they are too similar? It results in a pretty confusing error.
topepo commented
The issue is that there are not enough resamples to do racing. We'll now give a better error message
library(tidymodels)
library(finetune)
data(cells, package = "modeldata")
set.seed(31)
split <- cells %>%
select(-case) %>%
initial_split(prop = 0.8)
set.seed(234)
folds_3 <- training(split) %>% vfold_cv(v = 3)
folds_3
#> # 3-fold cross-validation
#> # A tibble: 3 × 2
#> splits id
#> <list> <chr>
#> 1 <split [1076/539]> Fold1
#> 2 <split [1077/538]> Fold2
#> 3 <split [1077/538]> Fold3
xgb_spec <- boost_tree(mode = "classification", trees = tune())
set.seed(345)
res <-
workflow(class ~ ., xgb_spec) %>%
tune_race_anova(
resamples = folds_3,
grid = 5
)
#> Error:
#> ! The number of resamples (3) needs to be more than the number of burn-in resamples (3) set by the control function `control_race()`.
#> Backtrace:
#> ▆
#> 1. ├─workflow(class ~ ., xgb_spec) %>% ...
#> 2. ├─finetune::tune_race_anova(., resamples = folds_3, grid = 5)
#> 3. └─finetune:::tune_race_anova.workflow(., resamples = folds_3, grid = 5) at finetune/R/tune_race_anova.R:97:2
#> 4. └─finetune:::tune_race_anova_workflow(...) at finetune/R/tune_race_anova.R:176:2
#> 5. └─finetune:::check_num_resamples(B, min_rs) at finetune/R/tune_race_anova.R:200:4
#> 6. └─rlang::abort(...) at finetune/R/tune_race_anova.R:291:4
# does work with >3 tho
set.seed(234)
folds_5 <- training(split) %>% vfold_cv(v = 5)
folds_5
#> # 5-fold cross-validation
#> # A tibble: 5 × 2
#> splits id
#> <list> <chr>
#> 1 <split [1292/323]> Fold1
#> 2 <split [1292/323]> Fold2
#> 3 <split [1292/323]> Fold3
#> 4 <split [1292/323]> Fold4
#> 5 <split [1292/323]> Fold5
xgb_spec <- boost_tree(mode = "classification", trees = tune())
set.seed(345)
res <-
workflow(class ~ ., xgb_spec) %>%
tune_race_anova(
resamples = folds_5,
grid = 5
)
plot_race(res)
Created on 2022-09-05 with reprex v2.0.2
github-actions commented
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.