Optuna example (fastai v2): is the example maximizing validation loss ?
maxiswiz opened this issue · 4 comments
Hello,
I just ran the code provided by optuna-examples and it turns out that the validation loss is being maximized.
Considering
pruner = optuna.pruners.MedianPruner(n_startup_trials=2)
study = optuna.create_study(direction="maximize", pruner=pruner)
study.optimize(objective, n_trials=100, timeout=1000)
Trials 0
and 1
are not resumed as n_startup_trials
is set to 2. I get a validation loss of 0.213294
and 0.158786
respectively at epoch/step 0. At trial 2, it gets pruned at epoch 0 with a validation loss of 0.113877
. Yet, trial 3 gets to the end with a validation loss of 0.227257
at epoch 0 ! I understand the goal of maximizing the accuracy from the validation set (it's the output being returned by the objective
function). Although, the direction being set for the study seems to look for maximizing the validation loss at pruning time. It feels very counter-intuitive.
Replication should be easy as the code from the optuna example is self-contained.
I hope I am not misunderstanding things along the way.
Best regards,
Maxime
here is some more details
|epoch | train_loss | valid_loss | accuracy | time|
|--- | --- | --- | --- | ---|
|0 | 0.471152 | 0.213294 | 0.959274 | 00:03|
|1 | 0.251552 | 0.085383 | 0.985770 | 00:03|
|2 | 0.126371 | 0.055183 | 0.988224 | 00:03|
|3 | 0.076224 | 0.039957 | 0.990186 | 00:03|
|4 | 0.056587 | 0.030830 | 0.990677 | 00:03|
|5 | 0.047404 | 0.023672 | 0.993131 | 00:03|
|6 | 0.039950 | 0.022306 | 0.992149 | 00:03|
|7 | 0.041833 | 0.020039 | 0.993621 | 00:03|
|8 | 0.035657 | 0.018003 | 0.993131 | 00:03|
|9 | 0.030431 | 0.015924 | 0.995093 | 00:03|
[I 2021-06-23 16:54:52,062] Trial 0 finished with value: 0.9950932264328003 and parameters: {'apply_tfms': True, 'max_rotate': 1, 'max_zoom': 1.242046338907775, 'p_affine': 0.7000000000000001, 'n_layers': 5, 'n_channels_0': 4, 'n_channels_1': 9, 'n_channels_2': 11, 'n_channels_3': 9, 'n_channels_4': 21}. Best is trial 0 with value: 0.9950932264328003.
epoch train_loss valid_loss accuracy time
0 0.391116 0.158786 0.964181 00:03
1 0.255199 0.098080 0.971541 00:03
2 0.208814 0.077618 0.975957 00:03
3 0.178489 0.066051 0.980373 00:03
4 0.158369 0.061916 0.978901 00:03
5 0.146922 0.060670 0.984298 00:03
6 0.137263 0.062824 0.975957 00:03
7 0.130735 0.044604 0.986752 00:03
8 0.124975 0.048074 0.990186 00:03
9 0.120322 0.056707 0.979392 00:03
[I 2021-06-23 16:55:32,077] Trial 1 finished with value: 0.9793915748596191 and parameters: {'apply_tfms': True, 'max_rotate': 32, 'max_zoom': 1.6189198430838707, 'p_affine': 0.30000000000000004, 'n_layers': 2, 'n_channels_0': 22, 'n_channels_1': 15}. Best is trial 0 with value: 0.9950932264328003.
|epoch | train_loss | valid_loss | accuracy | time|
|--- | --- | --- | --- | ---|
|0 | 0.189745 | 0.113877 | 0.960746 | 00:02|
[I 2021-06-23 16:55:36,113] Trial 2 pruned. Trial was pruned at epoch 0.
|epoch | train_loss | valid_loss | accuracy | time|
|--- | --- | --- | --- | ---|
|0 | 0.483240 | 0.227257 | 0.973013 | 00:03|
|1 | 0.267175 | 0.114635 | 0.980864 | 00:03|
|2 | 0.154494 | 0.062144 | 0.990677 | 00:03|
|3 | 0.103251 | 0.037902 | 0.991168 | 00:03|
|4 | 0.083039 | 0.037244 | 0.992640 | 00:03|
|5 | 0.070180 | 0.028697 | 0.991168 | 00:03|
|6 | 0.059144 | 0.024078 | 0.993131 | 00:03|
|7 | 0.055209 | 0.022824 | 0.993131 | 00:03|
|8 | 0.052700 | 0.020156 | 0.994112 | 00:03|
|9 | 0.046110 | 0.023263 | 0.991659 | 00:03|
[I 2021-06-23 16:56:12,867] Trial 3 finished with value: 0.9916585087776184 and parameters: {'apply_tfms': True, 'max_rotate': 45, 'max_zoom': 1.0501142679315925, 'p_affine': 0.8, 'n_layers': 5, 'n_channels_0': 16, 'n_channels_1': 6, 'n_channels_2': 10, 'n_channels_3': 13, 'n_channels_4': 16}. Best is trial 0 with value: 0.9950932264328003.
Overall, it feels like there is a conflict between the accuracy that should be maximized and the validation loss that should be minimized. Pruning turns out to maximize the validation loss in the code snippet provided by the optuna example.
@maxiswiz Thank you for asking about the question on the fastAI example with informative examples that are really helpful to understand your question. I certainly agree with you.
The default value of monitor
in FastAIV2PruningCallback
is valid_loss
, but the direction of optimisation is maximize
in this example as you mentioned. I think we should set accuracy
as the monitor
value of FastAIV2PruningCallback
.
I think the fastAI v1 example has the same issue.