optuna/optuna-examples

Do we have option to tune keras with 5 fold cross validation

talhaanwarch opened this issue · 10 comments

I am trying to optimize a model. but it worked on the fold i optimized and failed on the rest of the models

Hi, could you share minimal reproducible codes with us?

Anyway, the short answer to the question as us the title is no.

I do some thing as follow

def ourmodel(param):
    x=Input(...)
    x=Dense(param)(x)
    x=Dense(1)(x)
    model = Model(inputs=input, outputs=out)
    model.compile(optimizer ='adam', loss = 'binary_crossentropy', metrics=['accuracy'])
    return model
def crossval(param):
    gkf=GroupKFold()
    score=[]
    for train_index, val_index in gkf.split(data_array, label_array, groups=group_array):
        train_features,train_labels=data_array[train_index],label_array[train_index]
        val_features,val_labels=data_array[val_index],label_array[val_index]
        train_features=scaler.fit_transform(train_features)
        val_features=scaler.transform(val_features)
        model=ourmodel(param)
        model.fit(train_features,train_labels,epochs=50,batch_size=1024*8,validation_data=(val_features,val_labels),verbose=0)
        res=model.evaluate(val_features,val_labels,verbose=0)[1]
        score.append(res)

    avg=np.mean(score)
    return avg

def objective(trial):
    units=trial.suggest_categorical("units",[3,5,7,9])
    score=crossval(units)
    return score

import optuna
if __name__ == "__main__":

    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50)

    print("Number of finished trials: {}".format(len(study.trials)))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: {}".format(trial.value))

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))

I think the code looks fine. Could you elaborate on your question?

If you are suffering from OOM errors, keras.backend.clear_session might help you.
It explicitly clears the computation graphs of Keras, so please call it at the end of the for-loop in crossval function.

# Clear clutter from previous Keras session graphs.
clear_session()

I think the code looks fine. Could you elaborate on your question?

if i do train, test split with 0.2, 0.3 ratio, after search, i can get good results only on that fold and model flops if i test it on some else split

Thanks. It sounds overfitting. Does this happen when replacing gkf=GroupKFold() with gkf=KFold()? If so, group_array might be set inappropriately. As a result, each model only sees a subset of labels or imbalanced data.

Ah, thank you for your explanation. So it might depend on your optimisation problem.

I believe that I've answered the original question and the provided code looks fine. Since we could not know about the dataset, we could not give additional replies anymore. Therefore let me close this issue.