
Successive Halving and Hyperband in the mlr3 ecosystem

mlr3hyperband adds the optimization algorithms Successive Halving (Jamieson and Talwalkar 2016) and Hyperband (Li et al. 2018) to the mlr3 ecosystem. The implementation in mlr3hyperband features improved scheduling and parallelizes the evaluation of configurations. The package includes tuners for hyperparameter optimization in mlr3tuning and optimizers for black-box optimization in bbotk.


There are several sections about hyperparameter optimization in the mlr3book.

The gallery features a series of case studies on Hyperband.

Install the last release from CRAN:


Install the development version from GitHub:



We optimize the hyperparameters of an XGBoost model on the Sonar data set. The number of boosting rounds nrounds is the fidelity parameter. We tag this parameter with "budget" in the search space.


learner = lrn("classif.xgboost",
  nrounds           = to_tune(p_int(27, 243, tags = "budget")),
  eta               = to_tune(1e-4, 1, logscale = TRUE),
  max_depth         = to_tune(1, 20),
  colsample_bytree  = to_tune(1e-1, 1),
  colsample_bylevel = to_tune(1e-1, 1),
  lambda            = to_tune(1e-3, 1e3, logscale = TRUE),
  alpha             = to_tune(1e-3, 1e3, logscale = TRUE),
  subsample         = to_tune(1e-1, 1)

We use the tune() function to run the optimization.

instance = tune(
  tnr("hyperband", eta = 3),
  task = tsk("pima"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce")

The instance contains the best-performing hyperparameter configuration.

##    nrounds       eta max_depth colsample_bytree colsample_bylevel    lambda     alpha subsample
## 1:      27 -2.102951         3        0.7175178         0.5419011 -5.390012 -4.696385  0.193622
## 3 variables not shown: [learner_param_vals, x_domain, classif.ce]

The archive contains all evaluated hyperparameter configurations. Hyperband adds the "stage" and "braket".

as.data.table(instance$archive)[, .(stage, bracket, classif.ce, nrounds)]
##     stage bracket classif.ce nrounds
##  1:     0       2  0.3489583      27
##  2:     0       2  0.2434896      27
##  3:     0       2  0.2591146      27
##  4:     0       2  0.3489583      27
##  5:     0       2  0.5052083      27
## ---                                 
## 18:     0       0  0.2434896     243
## 19:     0       0  0.4960938     243
## 20:     0       0  0.2903646     243
## 21:     2       2  0.2473958     243
## 22:     1       1  0.2421875     243

We fit a final model with optimized hyperparameters to make predictions on new data.

learner$param_set$values = instance$result_learner_param_vals


