Different results in Optuna best value and re-train

Question

Different results in Optuna best value and re-train

Closed this issue 2 months ago · 3 comments

Datasplit:

data = normalized_df_ros.to_numpy()
target = y_train_oversample

train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25, random_state = 52)

I have used the code below for Optuna hyperparameter tuning:

def objective_ros(trial):
  dtrain = xgb.DMatrix(train_x, label=train_y)
  dvalid = xgb.DMatrix(valid_x, label=valid_y)

  param = {
        "objective": "binary:logistic",
        "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 1.0),
        "learning_rate": trial.suggest_loguniform("alpha", 1e-8, 1.0),
        "subsample": trial.suggest_float("subsample", 0.05, 1.0),
  }

  if param["booster"] == "gbtree" or param["booster"] == "dart":
    param["max_depth"] = trial.suggest_int("max_depth", 1, 9)
    param["eta"] = trial.suggest_loguniform("eta", 1e-8, 1.0)
    param["gamma"] = trial.suggest_loguniform("gamma", 1e-8, 1.0)
    param["grow_policy"] = trial.suggest_categorical("grow_policy", ["depthwise", "lossguide"])
  if param["booster"] == "dart":
    param["sample_type"] = trial.suggest_categorical("sample_type", ["uniform", "weighted"])
    param["normalize_type"] = trial.suggest_categorical("normalize_type", ["tree", "forest"])
    param["rate_drop"] = trial.suggest_loguniform("rate_drop", 1e-8, 1.0)
    param["skip_drop"] = trial.suggest_loguniform("skip_drop", 1e-8, 1.0)

  bst = xgb.XGBClassifier(**param, random_state = 52)
  bst.fit(valid_x, valid_y)
  preds = bst.predict(valid_x)

  f1 =  f1_score(valid_y, preds, average='micro')
  return f1

if __name__ == "__main__":
    """
      The optuna creates the study to optimize the objective function with 100 trials and return the best trial
      that has the maximum F1-Score and the best hyperparameters of that trial

    """

    study = optuna.create_study(
        pruner=optuna.pruners.MedianPruner(n_warmup_steps=5), direction="maximize"
    )
    study.optimize(objective_ros, n_trials=3)
    print(study.best_trial)

    print("Number of finished trials: {}".format(len(study.trials)))

    print("Best trial:")
    trial = study.best_trial

    print("  F1 Score: {}".format(trial.value))

    print(" Best Hypeparameters: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))

Output:

FrozenTrial(number=2, state=TrialState.COMPLETE, values=[0.9856801909307876], datetime_start=datetime.datetime(2024, 5, 8, 15, 35, 16, 222522), datetime_complete=datetime.datetime(2024, 5, 8, 15, 36, 2, 629269), params={'booster': 'dart', 'lambda': 0.002311280531979015, 'alpha': 0.0002916755277581915, 'subsample': 0.9746150609390157, 'max_depth': 6, 'eta': 1.2123850647079977e-05, 'gamma': 7.277285195329304e-05, 'grow_policy': 'lossguide', 'sample_type': 'weighted', 'normalize_type': 'tree', 'rate_drop': 0.0010080385982105485, 'skip_drop': 0.002532401998679777}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'booster': CategoricalDistribution(choices=('gbtree', 'gblinear', 'dart')), 'lambda': FloatDistribution(high=1.0, log=True, low=1e-08, step=None), 'alpha': FloatDistribution(high=1.0, log=True, low=1e-08, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.05, step=None), 'max_depth': IntDistribution(high=9, log=False, low=1, step=1), 'eta': FloatDistribution(high=1.0, log=True, low=1e-08, step=None), 'gamma': FloatDistribution(high=1.0, log=True, low=1e-08, step=None), 'grow_policy': CategoricalDistribution(choices=('depthwise', 'lossguide')), 'sample_type': CategoricalDistribution(choices=('uniform', 'weighted')), 'normalize_type': CategoricalDistribution(choices=('tree', 'forest')), 'rate_drop': FloatDistribution(high=1.0, log=True, low=1e-08, step=None), 'skip_drop': FloatDistribution(high=1.0, log=True, low=1e-08, step=None)}, trial_id=2, value=None)
Number of finished trials: 3
Best trial:
  F1 Score: 0.9856801909307876
 Best Hypeparameters: 
    booster: dart
    lambda: 0.002311280531979015
    alpha: 0.0002916755277581915
    subsample: 0.9746150609390157
    max_depth: 6
    eta: 1.2123850647079977e-05
    gamma: 7.277285195329304e-05
    grow_policy: lossguide
    sample_type: weighted
    normalize_type: tree
    rate_drop: 0.0010080385982105485
    skip_drop: 0.002532401998679777

Using the best hyperparameters from Optuna, I re-trianed the 'XGBClassifier'

Code:

clf = xgb.XGBClassifier(**best_params, random_state = 52)
clf.fit(train_x, train_y)
valid_pred_ros = clf.predict(valid_x)

f1 =  f1_score(valid_y, valid_pred_ros, average='micro')
print("validation f1_score : ", f1)

Output:

validation f1_score : 0.937947494033413

Now, you can see the best value from Optuna is F1 Score: 0.9856801909307876, whereas the validation F1-score that is trained with the same hyperparameters and same XGBclassifier is validation f1_score : 0.937947494033413.

my question is why the values are different here though the parameters are the same?

nzw0301 commented 2 months ago

Thanks!

Answer 1 · 2024-05-08T16:14:32.000Z

Could you post an optuna question on https://github.com/optuna/optuna/discussions since the question is not related to this repo, optuna-example?

Answer 2 · 2024-05-08T16:17:59.000Z

Could you post an optuna question on https://github.com/optuna/optuna/discussions since the question is not related to this repo, optuna-example?

Done.