Binary classifier with probabilities of each class
Roh-codeur opened this issue · 5 comments
Hi
thanks for your work on this library and for making it open-source. it's quite impressive indeed. I currently use XGBoost to train my model with a custom objective for Binary classification. the model generates probabilities for each class.
XGBoost is very slow on my dataset (500k*3k). the benchmarks you posted seem super impressive, I am quite keen to try out EvoTrees and see how it goes
boost = xgboost(
dtrain,
numberOfRounds,
eta =learningRate,
metrics = ["auc", "aucpr", "logloss"],
obj = WeightedLoss,
tree_method="hist")
)
- I am wondering what would be the equivalent of above in EvoTrees, please?
- Do you have plans to support Apple M1 GPU as well please?
- Also, do you have any quick tips and tricks for me? I am quite new to programming and Machine learning
thanks a lot!
For binary classification, the simplest would be to opt for a logistic regression:
config = EvoTreeRegressor(
loss=:logistic,
metric = :logloss,
nrounds=100,
nbins = 32,
lambda = 0.5,
gamma=0.1, eta=0.1,
max_depth = 6,
rowsample=0.5,
colsample=1.0)
model = fit_evotree(config; x_train, y_train, x_eval, y_eval, print_every_n = 25)
Such model would output a single probability, which is sufficient for binary cases.
Alternatively, for formal classification, EvoTreeClassifier can be used: https://evovest.github.io/EvoTrees.jl/dev/models/#EvoTreeClassifier. It will output the probability for each class, but will also likely be slower.
Note that EvoTrees' max_depth
is equivalent to XGBoost depth - 1
. So If you have depth 5 in XGBoost, then the equivalent is 6 for EvoTrees.
I don't know about WeightedLoss
, I assume it's your custom loss? EvoTrees doesn't have a formal API to specifiy custom losses, but it's fairly easy to implement a new one.
Since I don't use any Mac, I don't plan to add support for Metal / M1 / M2 as I already have a pretty long list on things I want to see improved on the more common CPU & CUDA use cases. I would naturally welcome however any PR that adds support for Apple GPU (it may come down to essentially a copy paste of the GPU src file with a few adaptations).
- Hard to provide generic yet useful advices :) I think putting your hands on an actual project and go through it by implementing parts yourself can be a great way to figure out the challenges. Having some thing to measure against, such as is the case with Kaggle datasets can be useful to get a sense of the relevant data preparation and cross validation approaches.
thanks a lot, mate! this is incredibly useful.
Alternatively, for formal classification, EvoTreeClassifier can be used: https://evovest.github.io/EvoTrees.jl/dev/models/#EvoTreeClassifier. It will output the probability for each class, but will also likely be slower.
Sure, I will try it with regressor. if I wanted to compare results or move to MultiClassification, I presume the EvoTreeClassifier would follow a similar pattern? can you please advise?
I don't know about WeightedLoss, I assume it's your custom loss? EvoTrees doesn't has a formal API to specifiy custom losses, but it's fairly easy to implement a new one.
yeah, indeed WeightedLoss is my custom Objective function. I use it to help with Class Imbalance.
I understand your point around Apple M1. I will start off with CPU, tbh, I am so impressed with the benchmarks.
I do indeed have a project where I currently use XGBoost. I am trying to replace XGBoost with EvoTrees. I will post back results.
@Roh-codeur Was there anything specific requiring to be fixed? Otherwise, I would close the issue.
@jeremiedb : thanks for your help with this. no fixes, I just had queries. I am still juggling a few things and have not had a chance to replace XGBoost with this, will post back with results. meantime, can you please help with below:
Sure, I will try it with regressor. if I wanted to compare results or move to MultiClassification, I presume the EvoTreeClassifier would follow a similar pattern? can you please advise?
ta!
There are now tutorials for classification and logistic regression in the latest docs: https://evovest.github.io/EvoTrees.jl/dev/