Evovest/EvoTrees.jl

poor results than XGboost, Lightboost, CatBoost, could you help to find better results ?

MrBenzWorld opened this issue · 10 comments

i found evotree is only pure julia ,gradient boost package. I appriciate it .
how to do hyper parameter optimization ?
you have compared run time of code , what happed to quality of results?
Please add example with results comparison with any real dataset with hyperparameter optimization.

Thank you

For clarification, did you actually experienced poorer results using EvoTrees.jl compared to xgboost and others?
Note that max_depth in EvoTees would be equivalent to max_depth - 1 in xgboost.
For hyper-parameters optimization, I'd recommend considering the MLJ framework, as EvoTrees.jl is meant to act purely as a algorithm and not a modeling framework. That beign said, there's the fit_evotree that provides a early stopping mechanism similar in order to find the optimal nrounds.

It's a good idea to provide some examples from real world datasets. Would you have some suggestions regarding such datasets / examples?

Thank you for your response,
Please use any avaible Kaggle data set ( house pricing etc..).
Example (https://www.kaggle.com/code/paulrohan2020/tutorial-lightgbm-xgboost-catboost-top-11/notebook)
(https://www.kaggle.com/code/jzeferino/house-pricing-catboost-vs-xgboost-vs-lightgbm)
EvoTree is great, but no Kaggle competition or any comparision avaiable for quality of results.
I have used MLJ but , it dosent support GPU. TreeParzen.jl (pure julia) optimizer is not working.

I have trained on MLJ with evetree but not getting good results. i dont want to use python. i want to use pure julia. But not getting good package.
Please add example and compare speed and quality of results.

julia is claiming , it is fast and it is best for machine learning. i dont find any single julia package placed in top position.

It is request and my feedback. Please help . Thank you

Please, see here, CatBoost is providing a good material and examples for beginner. it is really helpful.
(https://github.com/catboost/tutorials )
But they dont have julia package.( catboost.jl is not working)
I think, EvoTree is great and it can perform well. Please add some materials. Thank you

I haven't compared all cases but I have run the simple following multiclass experiment (EvoTrees v0.14.6, MLJXGBoostInterface v0.3.4), where hopefully I've set the most important hyperparameters right:

using MLJ, EvoTrees, StableRNGs, MLJXGBoostInterface

n = 50_000
nrounds = 100
max_depth = 6
eta = 0.3
nbins = 32
lambda = 0
resampling = StratifiedCV(nfolds=10)
rng = StableRNG(123)
X, y = make_blobs(n, rng=rng)

evomach = machine(EvoTreeClassifier(nrounds=nrounds, max_depth=max_depth, eta=eta, lambda=lambda, nbins=nbins, rng=rng), X, y)
evores = evaluate!(evomach, measure=log_loss, resampling = resampling)

xgmach = machine(XGBoostClassifier(num_round=nrounds, max_depth=max_depth, eta=eta, lambda=lambda, max_bin=nbins, seed=1), X, y)
xgres = evaluate!(xgmach, measure=log_loss, resampling = resampling)

@info string("EvoTrees LogLoss: ", evores.measurement[1])
@info string("XGBoost LogLoss: ", xgres.measurement[1])

On this task, I find that EvoTrees seems to be around 3 times faster than MLJXGBoostInterface. However the logloss seems almost an order of magnitude higher. I have tried to change the number of rounds but that does not seem to help. Any idea of what could be explaining this?

Only inconsistency in the above setting is that EvoTree depth is equivalent to XGBoost depth + 1. So you'd need to set EvoTree to 7 rather than 6. That being said, from in the above scenario, it shouldn't make much of a difference on both time and metric.

Given the nature of the very low resulting metric (0.02), it appears that the data is pretty much perfectly seperatable, resulting in 0.9999 / 0.0001 predictions. Due to exposures to underflow on Float32, EvoTrees sotftmax & logistic predicitons are floored at 1e-10. I couldn't perform the cmparison with XGBoost given it's broken on Windows for Julia v1.8, but I suspect the metrics difference may simply come from looser lower/upper bounds. Comparing metrics with on a common clamping should be able to validate this.

Actually, there's another quite important difference. By default XGBoost uses the "exact" tree_method, while EvoTrees is all about the "hist" method. That would explain much of the obersed difference as the nbins isn't actually used by XGBoost unless tree_method is set to "hist".

Unless there's a specific concern about algo correctness, I'll close the issue.
I understand having more tutorials and examples is always welcome, though on the short term I don't expect to have time for those. Hopefully, docs' examples https://evovest.github.io/EvoTrees.jl/dev/examples-API/ or benchmarks such as https://github.com/Evovest/EvoTrees.jl/blob/main/experiments/benchmarks-regressor.jl can provide reaasonable material.

I'd be happy to review tutorials addition to the docs is one wants to open a PR.

Thanks Jeremie, I've run the 2 experiments you suggest. Using a Float64 EvoTree did not change the results but the switch to "hist" in XGBoost indeed brought the logloss up to around 0.02 and not really "significantly" different from EvoTree. I will read more about those "tree methods" then.

Depending on the direction my project takes I might rely quite a bit on gradient boosting tree and will be happy to contribute to extending docs etc... then.

Feel free to close.

The histogram method is typically a go-to whenever dealing with large datasets, like ~100+.
If you look at this outdatd README pages, speed comparison included XGBoost exact method, you noticed that at 1M, it becomes really inefficient: https://github.com/Evovest/EvoTrees.jl/tree/v0.6.1

In practice, on real datasets, the binning mechanism of the histogram methods typically acts as an efficient regularization mechanism to avoid overfitting.

In the above toy cases with easily seperatable target, you could approximate the exact method by cranking the number of bins to the maximum supported, 255. You should observed closer outcome to XGBoost's exact tree method in that case.

It's normal that using T=Float64 provided essentially the same results as the lower/upper bounds on the linear predictor is set to -10 regardless if the algorithm is set to Float32 or Float64. I did test running on Float with a modified lower bound and it resulted in a only slightly improved outcome on the toy dataset, which is reassuring of the soundness of the bounds. It's also indicative that the main source of difference came from the exact vs hist, also confirmed when testing with 255 bins.

A disclaimer would be to be careful about fitting a logistic regression where a class of interest would be extremely rare, like in the 1e-4/1e-3 range. In such scenario, I think one should considere some over/under sample approach, though such consideration would likely applicable to any algorithms.

Closing as the discrepencies were about setting comparable params setup beteen the libraries.