Give Michael a xgb CV snippet
michasou opened this issue · 1 comments
michasou commented
Give Michael a xgb CV snippet
plger commented
So assuming you have a numeric matrix of preditors (preds
) and values/label (labs
), you first use cross-validation to see at which boosting round your start to overfit:
res <- xgb.cv(data=preds, label=labs, nrounds=300, ..., early_stopping_rounds=2, nfold=5, subsample=0.75, nthread=8)
bi = res$best_iteration
# get the number of rounds that's 1 SD before the best iteration:
ac = res$evaluation_log$test_rmse_mean[bi] + 1 * res$evaluation_log$test_rmse_mean[bi]
nrounds = min(which(res$evaluation_log$test_rmse_mean <= ac))
# then you can run the actual fit on all the data:
fit <- xgboost(data=preds, label=labs, nrounds=nrounds, ..., early_stopping_rounds=2, nthread=1)
# and predict values:
predicted <- predict(fit, preds