HunterMcGushion/hyperparameter_hunter

HPH Regression?

strelzoff-erdc opened this issue · 3 comments

Hi,
We just finished a nasty but eventually successful 9-month search for a high-dimension, very non-linear, chock full of outliers, real world regression modeling problem. While we have a breather for a few days we're looking for a smarter way to evaluate and tune parameters for as many approaches as possible while we wait for our next problem to drop.
We like the approach of HPH and we see support for XGBRegressor and SKLearn-wrapped Keras Regressor but no example of hyper-parameter experiments or optimization with regression rather then classification. Missing something obvious? Example? Or wrong tool (at least for now)?

thanks

Hello, thank you for your interest in HyperparameterHunter!

HyperparameterHunter does work with regression tasks. I apologize; not including any examples was a large oversight, and I'm working on some right now to add.

One thing to note is that there is currently an issue with optimizing and sorting leaderboards when using error/loss metrics (#34), but there is also a simple workaround until #91 is merged into the master branch, which I expect to happen within a couple days.

Since you mentioned XGBRegressor compatibility, here are a couple quick examples demonstrating that it works.

from hyperparameter_hunter import Environment, CrossValidationExperiment
from hyperparameter_hunter import BayesianOptimization, Integer, Real, Categorical
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, r2_score

env = Environment(
   train_dataset=get_diabetes_data(),
   root_results_path="HyperparameterHunterAssets",
   metrics_map=dict(
      # mean_absolute_error=lambda t, p: -mean_absolute_error(t, p),
      r2_score=lambda t, p: -r2_score(t, p),
   ),
   cross_validation_type="KFold",
   cross_validation_params=dict(n_splits=10, shuffle=True, random_state=32),
)

##################################################
# Experiment
##################################################
experiment = CrossValidationExperiment(
   model_initializer=XGBRegressor,
   model_init_params=dict(objective="reg:linear", max_depth=4, n_estimators=400, subsample=0.5),
   model_extra_params=dict(fit=dict(eval_metric="mae")),
)

##################################################
# Hyperparameter Optimization
##################################################
optimizer = BayesianOptimization(iterations=20, read_experiments=True, random_state=32)

optimizer.set_experiment_guidelines(
   model_initializer=XGBRegressor,
   model_init_params=dict(
      max_depth=Integer(2, 20),
      learning_rate=Real(0.01, 0.7),
      n_estimators=Integer(100, 500),
      subsample=0.5,
      booster=Categorical(["gbtree", "gblinear"]),
   ),
   model_extra_params=dict(fit=dict(eval_metric=Categorical(["rmse", "mae"]))),
)

optimizer.go()

By running all of the above, you can also see one of the most important features of HyperparameterHunter, that is optimizer automatically finds the result of experiment and learns from it during optimization. Running this example repeatedly will produce a larger pool of past experiments that are automatically learned from each time, dramatically improving the process of hyperparameter optimization over a project's lifespan.

Here's the data loader I used above to format SKLearn's Diabetes regression dataset. It will be added to HyperparameterHunter along with the regression examples shortly.

import pandas as pd
from sklearn.datasets import load_diabetes

def get_diabetes_data():
   data = load_diabetes()
   df = pd.DataFrame(data=data.data, columns=[_.replace(" ", "_") for _ in data.feature_names])
   df["target"] = data.target
   return df

Please note in the above examples, I made no attempt at selecting effective hyperparameters. They are just used for the example, so they may perform poorly.

I hope this helps answer your question. If you have any other questions, please let me know!

@HunterMcGushion Thanks so much! HyperparameterHunter now running on the supercomputer. We'll let you know how it turns out.

Glad to hear it! Thank you! I always welcome suggestions, issues, or PRs. Whether or not you end up using HyperparameterHunter, good luck on your next hyperparameter search!