Evovest/EvoTrees.jl

Binary classifier with probabilities of each class

Roh-codeur opened this issue · 5 comments

Hi

thanks for your work on this library and for making it open-source. it's quite impressive indeed. I currently use XGBoost to train my model with a custom objective for Binary classification. the model generates probabilities for each class.
XGBoost is very slow on my dataset (500k*3k). the benchmarks you posted seem super impressive, I am quite keen to try out EvoTrees and see how it goes

   boost = xgboost(
        dtrain,
        numberOfRounds,
        eta =learningRate,
        metrics = ["auc", "aucpr", "logloss"],
        obj = WeightedLoss,
        tree_method="hist")
    )
  1. I am wondering what would be the equivalent of above in EvoTrees, please?
  2. Do you have plans to support Apple M1 GPU as well please?
  3. Also, do you have any quick tips and tricks for me? I am quite new to programming and Machine learning

thanks a lot!

For binary classification, the simplest would be to opt for a logistic regression:

config = EvoTreeRegressor(
    loss=:logistic, 
    metric = :logloss,
    nrounds=100, 
    nbins = 32,
    lambda = 0.5, 
    gamma=0.1, eta=0.1,
    max_depth = 6,
    rowsample=0.5, 
    colsample=1.0)

model = fit_evotree(config; x_train, y_train, x_eval, y_eval, print_every_n = 25)

Such model would output a single probability, which is sufficient for binary cases.

Alternatively, for formal classification, EvoTreeClassifier can be used: https://evovest.github.io/EvoTrees.jl/dev/models/#EvoTreeClassifier. It will output the probability for each class, but will also likely be slower.

Note that EvoTrees' max_depth is equivalent to XGBoost depth - 1. So If you have depth 5 in XGBoost, then the equivalent is 6 for EvoTrees.

I don't know about WeightedLoss, I assume it's your custom loss? EvoTrees doesn't have a formal API to specifiy custom losses, but it's fairly easy to implement a new one.

Since I don't use any Mac, I don't plan to add support for Metal / M1 / M2 as I already have a pretty long list on things I want to see improved on the more common CPU & CUDA use cases. I would naturally welcome however any PR that adds support for Apple GPU (it may come down to essentially a copy paste of the GPU src file with a few adaptations).

  1. Hard to provide generic yet useful advices :) I think putting your hands on an actual project and go through it by implementing parts yourself can be a great way to figure out the challenges. Having some thing to measure against, such as is the case with Kaggle datasets can be useful to get a sense of the relevant data preparation and cross validation approaches.

thanks a lot, mate! this is incredibly useful.

Alternatively, for formal classification, EvoTreeClassifier can be used: https://evovest.github.io/EvoTrees.jl/dev/models/#EvoTreeClassifier. It will output the probability for each class, but will also likely be slower.

Sure, I will try it with regressor. if I wanted to compare results or move to MultiClassification, I presume the EvoTreeClassifier would follow a similar pattern? can you please advise?

I don't know about WeightedLoss, I assume it's your custom loss? EvoTrees doesn't has a formal API to specifiy custom losses, but it's fairly easy to implement a new one.

yeah, indeed WeightedLoss is my custom Objective function. I use it to help with Class Imbalance.

I understand your point around Apple M1. I will start off with CPU, tbh, I am so impressed with the benchmarks.

I do indeed have a project where I currently use XGBoost. I am trying to replace XGBoost with EvoTrees. I will post back results.

@Roh-codeur Was there anything specific requiring to be fixed? Otherwise, I would close the issue.

@jeremiedb : thanks for your help with this. no fixes, I just had queries. I am still juggling a few things and have not had a chance to replace XGBoost with this, will post back with results. meantime, can you please help with below:

Sure, I will try it with regressor. if I wanted to compare results or move to MultiClassification, I presume the EvoTreeClassifier would follow a similar pattern? can you please advise?

ta!

There are now tutorials for classification and logistic regression in the latest docs: https://evovest.github.io/EvoTrees.jl/dev/