tidymodels/finetune

`tune_race_anova()` not handling very similar metrics well?

juliasilge opened this issue · 2 comments

This SO question seems to highlight a situation where finetune isn't handling an edge case very well:

library(tidymodels)
library(finetune)
#> Registered S3 method overwritten by 'finetune':
#>   method            from
#>   obj_sum.tune_race tune
data(cells, package = "modeldata")

set.seed(31)
split <- cells %>% 
  select(-case) %>%
  initial_split(prop = 0.8)

set.seed(234)
folds <- training(split) %>% vfold_cv(v = 3)
folds
#> #  3-fold cross-validation 
#> # A tibble: 3 × 2
#>   splits             id   
#>   <list>             <chr>
#> 1 <split [1076/539]> Fold1
#> 2 <split [1077/538]> Fold2
#> 3 <split [1077/538]> Fold3

xgb_spec <- boost_tree(mode = "classification", trees = tune()) 

set.seed(234)
workflow(class ~ ., xgb_spec) %>% 
  tune_grid(
    resamples = folds,
    grid = 5
  ) 
#> # Tuning results
#> # 3-fold cross-validation 
#> # A tibble: 3 × 4
#>   splits             id    .metrics          .notes          
#>   <list>             <chr> <list>            <list>          
#> 1 <split [1076/539]> Fold1 <tibble [10 × 5]> <tibble [0 × 3]>
#> 2 <split [1077/538]> Fold2 <tibble [10 × 5]> <tibble [0 × 3]>
#> 3 <split [1077/538]> Fold3 <tibble [10 × 5]> <tibble [0 × 3]>

set.seed(345)
workflow(class ~ ., xgb_spec) %>% 
  tune_race_anova(
    resamples = folds,
    grid = 5
  ) 
#> Error in `mutate()`:
#> ! Problem while computing `col = purrr::map(splits, ~NULL)`.
#> x `col` must be size 1, not 0.

Created on 2022-02-22 by the reprex package (v2.0.1)

The error is coming from tune:::pulley() and I think maybe it is removing all the candidates at some step? Because they are too similar? It results in a pretty confusing error.

The issue is that there are not enough resamples to do racing. We'll now give a better error message

library(tidymodels)
library(finetune)

data(cells, package = "modeldata")

set.seed(31)
split <- cells %>% 
  select(-case) %>%
  initial_split(prop = 0.8)

set.seed(234)
folds_3 <- training(split) %>% vfold_cv(v = 3)
folds_3
#> #  3-fold cross-validation 
#> # A tibble: 3 × 2
#>   splits             id   
#>   <list>             <chr>
#> 1 <split [1076/539]> Fold1
#> 2 <split [1077/538]> Fold2
#> 3 <split [1077/538]> Fold3

xgb_spec <- boost_tree(mode = "classification", trees = tune()) 

set.seed(345)
res <- 
  workflow(class ~ ., xgb_spec) %>% 
  tune_race_anova(
    resamples = folds_3,
    grid = 5
  ) 
#> Error:
#> ! The number of resamples (3) needs to be more than the number of burn-in resamples (3) set by the control function `control_race()`.

#> Backtrace:
#>     ▆
#>  1. ├─workflow(class ~ ., xgb_spec) %>% ...
#>  2. ├─finetune::tune_race_anova(., resamples = folds_3, grid = 5)
#>  3. └─finetune:::tune_race_anova.workflow(., resamples = folds_3, grid = 5) at finetune/R/tune_race_anova.R:97:2
#>  4.   └─finetune:::tune_race_anova_workflow(...) at finetune/R/tune_race_anova.R:176:2
#>  5.     └─finetune:::check_num_resamples(B, min_rs) at finetune/R/tune_race_anova.R:200:4
#>  6.       └─rlang::abort(...) at finetune/R/tune_race_anova.R:291:4

# does work with >3 tho
set.seed(234)
folds_5 <- training(split) %>% vfold_cv(v = 5)
folds_5
#> #  5-fold cross-validation 
#> # A tibble: 5 × 2
#>   splits             id   
#>   <list>             <chr>
#> 1 <split [1292/323]> Fold1
#> 2 <split [1292/323]> Fold2
#> 3 <split [1292/323]> Fold3
#> 4 <split [1292/323]> Fold4
#> 5 <split [1292/323]> Fold5

xgb_spec <- boost_tree(mode = "classification", trees = tune()) 

set.seed(345)
res <- 
  workflow(class ~ ., xgb_spec) %>% 
  tune_race_anova(
    resamples = folds_5,
    grid = 5
  ) 

plot_race(res)

Created on 2022-09-05 with reprex v2.0.2

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.