ETHZ-INS/scanMiR

Give Michael a xgb CV snippet

michasou opened this issue · 1 comments

Give Michael a xgb CV snippet
plger commented

So assuming you have a numeric matrix of preditors (preds) and values/label (labs), you first use cross-validation to see at which boosting round your start to overfit:

res <- xgb.cv(data=preds, label=labs, nrounds=300, ..., early_stopping_rounds=2, nfold=5, subsample=0.75, nthread=8)
bi = res$best_iteration
# get the number of rounds that's 1 SD before the best iteration:
ac = res$evaluation_log$test_rmse_mean[bi] + 1 * res$evaluation_log$test_rmse_mean[bi]
nrounds = min(which(res$evaluation_log$test_rmse_mean <= ac))
# then you can run the actual fit on all the data:
fit <- xgboost(data=preds, label=labs, nrounds=nrounds, ..., early_stopping_rounds=2, nthread=1)
# and predict values:
predicted <- predict(fit, preds