Evovest/EvoTrees.jl

Feature/Tutorial Request: Hyperparameter tuning

Opened this issue · 5 comments

Grad student descent is definitely not fun, so it would be very nice to have a way to tune hyperparameters efficiently, and a tutorial on how to do this. (MLJTuning.jl lets you do it in theory, but only provides a handful of black-box optimizers like random or grid search.)

Are there specific hyper tuning methods you'd like to see covered?
With regard to demonstation with internal EvoTrees API, I'd tend to recommend a simple random search.
And for more specific tuning technics, I'd tend to favor developing them in a mostly algo agnostic way. MLJ seems like a good target in that regard. Were you seeing reasons to build a more elaborate hyper tuning wihtin a specific algo?

Are there specific hyper tuning methods you'd like to see covered?

Mostly just a gradient method for the continuous parameters. Grid search should be fine for the discrete hyperparameters, given there's only 1 or 2.

Could you precise the nature of the hyper search you're envisioning? I'm not clear how a gradient method could be applied here for hyper-search as the an EvoTree loss function isn't differentiable with respect to its hyper-parameter. Perhaps you're referring to apply a gradient method to eval metric outcomes to inform on next hyper candidate to test?
Other than Random search, my undertanding is that bayesian search may be the other most useful approach, but I may well have blind spots in my portrait of the hyper-tuning landscape.

Whoops, this is supposed to be in EvoLinear.jl 😅

(Although, I thought the loss was differentiable with respect to lambda? But I might be mixing that up with some other decision tree algorithm.)

Even in the context of EvoLinear, I'm not understanding the applicability of a gradient method for hyper params tuning.
Would you have an example (package/paper) of what you're trying to achieve?
Hyper param tuning is typically about figuring a hyper-param that leads to better generalisation on an out-of-sample dataset. In that context, I have difficulty to see how the feedback from the out-of-sample may be used to infer a udate to the hyper-param. Taking a minimal use case, linear regression with L2 regularization, how would L2 be updated from the out of sample metric?