/stackgbm

Stacked Gradient Boosting Machines

Primary LanguageR

stackgbm

Lifecycle: experimental

stackgbm offers a minimalist implementation of model stacking (Wolpert, 1992) for gradient boosted tree models built by xgboost (Chen and Guestrin, 2016), lightgbm (Ke et al., 2017), and catboost (Prokhorenkova et al., 2018).

Install

First, make sure to install two R packages that are not yet available from CRAN as of June 2020:

Then install stackgbm from GitHub:

remotes::install_github("nanxstats/stackgbm")

Design

stackgbm implements a classic two-layer stacking model: the first layer generates "features" produced by gradient boosting trees. The second layer is a logistic regression that uses these features as inputs. The code is derived from our 2nd place solution for a precisionFDA brain cancer machine learning challenge in 2020.

To make sure the package is easy to understand, modify, and extend, we choose to build this package with base R without any special frameworks or dialects. We also only exposed the most essential tunable parameters for the boosted tree models (learning rate, maximum depth of a tree, and number of iterations).

License

stackgbm is free and open source software, licensed under GPL-3.