/GBDT-PL

Gradient Boosting With Piece-Wise Linear Trees

Primary LanguageC++MIT LicenseMIT

GBDT-PL

This is the implementation for the paper Gradient Boosting with Piece-Wise Linear Regression Trees. We extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees. PL Trees can accelerate convergence of GBDT. Moreover, our new algorithm fits better to modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. We name our new algorithm GBDT-PL.

News

An approximated version of GBDT-PL has been implemented in LightGBM with the linear_tree=true parameter enabled.

Experiments

We evaluate our algorithm on 10 public datasets.

Dataset Name #Training #Testing #Features Task Link
Higgs 10,000,000 500,000 28 Classification higgs
Epsilon 400,000 100,000 2000 Classification epsilon
HEPMASS 7,000,000 3,500,000 28 Classification hepmass
SUSY 4,000,000 1,000,000 18 Classification susy
CASP 29,999 15,731 9 Regression casp
SGEMM 193,280 48,320 14 Regression sgemm
SUPERCONDUCTOR 17,008 4,255 81 Regression superconductor
CT 29,999 15,731 384 Regression ct
Energy 29,999 15,788 27 Regression energy
Year 412,206 103,139 90 Regression year

Accuracy

We compare the accuracy in the following table. For XGBoost, LightGBM and CatBoost different settings of hyperparameters are tried and the best result is picked for each algorithm. For GBDT-PL, we separte 20% of the training data for validation. Details of settings can be found in Parameter Setting Section of Experiment Setting.

Dataset Name LightGBM XGBoost CatBoost GBDT-PL
Higgs 0.854025 0.854147 0.851590 0.860198
Epsilon 0.951422 0.948292 0.957327 0.957894
HEPMASS 0.95563 0.95567 0.95554 0.95652
SUSY 0.878112 0.877825 0.878206 0.878287
CASP 3.4961 3.4939 3.5183 3.4574
SGEMM 4.61431 4.37929 4.41177 4.16871
SUPERCONDUCTOR 8.80776 8.91063 8.78452 8.79527
CT 1.30902 1.34131 1.36937 1.23753
Energy 64.256 64.780 65.761 65.462
Year 8.38817 8.37935 8.42593 8.37233

Training Time

We use the histogram version of XGBoost and LightGBM. Each tree in our experiments has at most 255 leaves. For CatBoost, we constrain the maximum tree depth to 8. The learning rate is 0.1. We test both 63 bins and 255 bins for the histograms. lambda is the coefficient for regularization terms, we set it as 0.01. min sum of hessians is the minimum sum of hessians allowed in each leaf. We set it as 100 to prevent the trees from growing too deep. For GBDT-PL, we use at most 5 regressors per leaf.

max leaves learning rate bins lambda min sum of hessians
255 0.1 63/255 0.01 100

We plot the accuracy of testing sets w.r.t. training time in Figure 2. To leave out the effect of evaluation time, for each dataset we have two separate runs. In the first run we record the training time per iteration only, without doing evaluation. In the second round we evaluate the accuracy every iteration. We use 24 threads for all algorithms. The preprocessing time is excluded.

Convergence Rate

We run 500 iterations and plot testing accuracy per iteration. For classification tasks, we use AUC as metric, and for regression we use RMSE.

Reference Paper

Yu Shi, Jian Li, Zhize Li. Gradient Boosting with Piece-Wise Linear Regression Trees. Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019: 3432-3438.