cerlymarco/linear-tree

Performance and possibility to split only on subset of features

Closed this issue · 2 comments

Hey, I have been playing around a lot with your linear trees. Like them very much. Thanks!

Nevertheless, I am somewhat disappointed by the runtime performance. Compared to XGBoost Regressors (I know it's not a fair comparison) or linear regressions (also not fair), the linear tree is reeeeeaally slow.
50k observations, 80 features: 2s for linear regression, 27s for XGBoost, and 300s for the linear tree.
Have you seen similar runtimes or might I be using it wrong?

Another aspects that's interesting to me is the question whether is possibe to limit the features which are used for splits. I haven't found it in the code. Any change to see it in the future?

Hi, thanks for your feedback!

The computation time is higher for sure comparing it with singular linear regression or decision tree (from sklearn) due to the increase of operations complexity. For sure I don't expect it is faster than XGB or LGB lineartrees (considering only 1 tree) due to a very different implementation. To reduce the training time you can consider reducing the max_depth, max_bins, or the expected samples in the leaves.

At the moment I don't plan to introduce this feature... I'm focusing on developing a special linear boosting.

All the best

Thanks for the info.