Do we have option to tune keras with 5 fold cross validation
talhaanwarch opened this issue · 10 comments
I am trying to optimize a model. but it worked on the fold i optimized and failed on the rest of the models
Hi, could you share minimal reproducible codes with us?
Anyway, the short answer to the question as us the title is no.
I do some thing as follow
def ourmodel(param):
x=Input(...)
x=Dense(param)(x)
x=Dense(1)(x)
model = Model(inputs=input, outputs=out)
model.compile(optimizer ='adam', loss = 'binary_crossentropy', metrics=['accuracy'])
return model
def crossval(param):
gkf=GroupKFold()
score=[]
for train_index, val_index in gkf.split(data_array, label_array, groups=group_array):
train_features,train_labels=data_array[train_index],label_array[train_index]
val_features,val_labels=data_array[val_index],label_array[val_index]
train_features=scaler.fit_transform(train_features)
val_features=scaler.transform(val_features)
model=ourmodel(param)
model.fit(train_features,train_labels,epochs=50,batch_size=1024*8,validation_data=(val_features,val_labels),verbose=0)
res=model.evaluate(val_features,val_labels,verbose=0)[1]
score.append(res)
avg=np.mean(score)
return avg
def objective(trial):
units=trial.suggest_categorical("units",[3,5,7,9])
score=crossval(units)
return score
import optuna
if __name__ == "__main__":
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)
print("Number of finished trials: {}".format(len(study.trials)))
print("Best trial:")
trial = study.best_trial
print(" Value: {}".format(trial.value))
print(" Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
I think the code looks fine. Could you elaborate on your question?
If you are suffering from OOM errors, keras.backend.clear_session
might help you.
It explicitly clears the computation graphs of Keras, so please call it at the end of the for-loop in crossval
function.
optuna-examples/keras/keras_simple.py
Lines 38 to 39 in 07c77d8
I think the code looks fine. Could you elaborate on your question?
if i do train, test split with 0.2, 0.3 ratio, after search, i can get good results only on that fold and model flops if i test it on some else split
Thanks. It sounds overfitting. Does this happen when replacing gkf=GroupKFold()
with gkf=KFold()
? If so, group_array
might be set inappropriately. As a result, each model only sees a subset of labels or imbalanced data.
Ah, thank you for your explanation. So it might depend on your optimisation problem.
I believe that I've answered the original question and the provided code looks fine. Since we could not know about the dataset, we could not give additional replies anymore. Therefore let me close this issue.