Extended experiments on multi-output benchmark datasets
Closed this issue · 8 comments
Hi, I recently found this work on arXiv, which solves one important problem in traditional tree boosting system, and is pretty solid.
Here I would like to ask that has the proposed method been evaluated on benchmark datasets in multi-output regression? (such as those in http://mulan.sourceforge.net/datasets-mtr.html)
I attempted to do this, but found tuning hyper-parameters a bit tricky...Thanks :)
Thanks for your interest. I am sorry that our method has not been evaluated on it.
Based on my experiments, the following tips might be useful for regression tasks:
-
Minimum samples on a leaf should be small.
-
If the dataset is small (number of samples is small), it is better to use the exact split algorithm instead of the histogram approximation. Unfortunately, this is not supported in the current version of GBDTMO. But I will add this function ASAP.
-
Preprocess of output is necessary. For example, the dynamic range of different outputs should be (nearly) the same. Otherwise, the loss is unbalanced.
Hi, thanks for your great work for GBDT.
Will your implement the idea on LightGBM(xgboost), which will be useful.
@stanpcf
I would like to treat this as an independent project which extends the function of GBDT. Multiple outputs is only the first step. Now, I am trying to develop novel regularizations and multi-layer training of GBDT. After all of these finished, I will consider integrating this project into LightGBM or XGBoost.
However, developing these new algorithms may take a long time. So, I am afraid I can't integrate this project into LightGBM or XGBoost in the coming months.
@stanpcf
Take a look at this repo (https://github.com/GBDT-PL/GBDT-PL), which might be useful.
@zzd1992 there is really a gap between research and develop. thx for your reply.
@AaronX121 it's a good idea for gbdt to fit the output. I had dived in GBDT-PL for months and developed some algorithm such as ltr, dart etc on it. but I had eventually deprecated this repo because the repo have too much bugs to fix. I had developed GBDT-PL on other GBDT-Lib. I also find some research fit node with other algorithm such as svm.