optuna/optuna-examples

Optuna example (fastai v2): is the example maximizing validation loss ?

maxiswiz opened this issue · 4 comments

Hello,

I just ran the code provided by optuna-examples and it turns out that the validation loss is being maximized.

Considering

pruner = optuna.pruners.MedianPruner(n_startup_trials=2)
study = optuna.create_study(direction="maximize", pruner=pruner)
study.optimize(objective, n_trials=100, timeout=1000)

Trials 0 and 1 are not resumed as n_startup_trials is set to 2. I get a validation loss of 0.213294 and 0.158786 respectively at epoch/step 0. At trial 2, it gets pruned at epoch 0 with a validation loss of 0.113877. Yet, trial 3 gets to the end with a validation loss of 0.227257 at epoch 0 ! I understand the goal of maximizing the accuracy from the validation set (it's the output being returned by the objective function). Although, the direction being set for the study seems to look for maximizing the validation loss at pruning time. It feels very counter-intuitive.

Replication should be easy as the code from the optuna example is self-contained.

I hope I am not misunderstanding things along the way.

Best regards,
Maxime

here is some more details

|epoch | train_loss | valid_loss | accuracy | time|
|--- | --- | --- | --- | ---|
|0 | 0.471152 | 0.213294 | 0.959274 | 00:03|
|1 | 0.251552 | 0.085383 | 0.985770 | 00:03|
|2 | 0.126371 | 0.055183 | 0.988224 | 00:03|
|3 | 0.076224 | 0.039957 | 0.990186 | 00:03|
|4 | 0.056587 | 0.030830 | 0.990677 | 00:03|
|5 | 0.047404 | 0.023672 | 0.993131 | 00:03|
|6 | 0.039950 | 0.022306 | 0.992149 | 00:03|
|7 | 0.041833 | 0.020039 | 0.993621 | 00:03|
|8 | 0.035657 | 0.018003 | 0.993131 | 00:03|
|9 | 0.030431 | 0.015924 | 0.995093 | 00:03|


[I 2021-06-23 16:54:52,062] Trial 0 finished with value: 0.9950932264328003 and parameters: {'apply_tfms': True, 'max_rotate': 1, 'max_zoom': 1.242046338907775, 'p_affine': 0.7000000000000001, 'n_layers': 5, 'n_channels_0': 4, 'n_channels_1': 9, 'n_channels_2': 11, 'n_channels_3': 9, 'n_channels_4': 21}. Best is trial 0 with value: 0.9950932264328003.

epoch    train_loss    valid_loss    accuracy    time
0    0.391116    0.158786    0.964181    00:03
1    0.255199    0.098080    0.971541    00:03
2    0.208814    0.077618    0.975957    00:03
3    0.178489    0.066051    0.980373    00:03
4    0.158369    0.061916    0.978901    00:03
5    0.146922    0.060670    0.984298    00:03
6    0.137263    0.062824    0.975957    00:03
7    0.130735    0.044604    0.986752    00:03
8    0.124975    0.048074    0.990186    00:03
9    0.120322    0.056707    0.979392    00:03

[I 2021-06-23 16:55:32,077] Trial 1 finished with value: 0.9793915748596191 and parameters: {'apply_tfms': True, 'max_rotate': 32, 'max_zoom': 1.6189198430838707, 'p_affine': 0.30000000000000004, 'n_layers': 2, 'n_channels_0': 22, 'n_channels_1': 15}. Best is trial 0 with value: 0.9950932264328003.


|epoch | train_loss | valid_loss | accuracy | time|
|--- | --- | --- | --- | ---|
|0 | 0.189745 | 0.113877 | 0.960746 | 00:02|

[I 2021-06-23 16:55:36,113] Trial 2 pruned. Trial was pruned at epoch 0.


|epoch | train_loss | valid_loss | accuracy | time|
|--- | --- | --- | --- | ---|
|0 | 0.483240 | 0.227257 | 0.973013 | 00:03|
|1 | 0.267175 | 0.114635 | 0.980864 | 00:03|
|2 | 0.154494 | 0.062144 | 0.990677 | 00:03|
|3 | 0.103251 | 0.037902 | 0.991168 | 00:03|
|4 | 0.083039 | 0.037244 | 0.992640 | 00:03|
|5 | 0.070180 | 0.028697 | 0.991168 | 00:03|
|6 | 0.059144 | 0.024078 | 0.993131 | 00:03|
|7 | 0.055209 | 0.022824 | 0.993131 | 00:03|
|8 | 0.052700 | 0.020156 | 0.994112 | 00:03|
|9 | 0.046110 | 0.023263 | 0.991659 | 00:03|

[I 2021-06-23 16:56:12,867] Trial 3 finished with value: 0.9916585087776184 and parameters: {'apply_tfms': True, 'max_rotate': 45, 'max_zoom': 1.0501142679315925, 'p_affine': 0.8, 'n_layers': 5, 'n_channels_0': 16, 'n_channels_1': 6, 'n_channels_2': 10, 'n_channels_3': 13, 'n_channels_4': 16}. Best is trial 0 with value: 0.9950932264328003.

Overall, it feels like there is a conflict between the accuracy that should be maximized and the validation loss that should be minimized. Pruning turns out to maximize the validation loss in the code snippet provided by the optuna example.

@maxiswiz Thank you for asking about the question on the fastAI example with informative examples that are really helpful to understand your question. I certainly agree with you.

The default value of monitor in FastAIV2PruningCallback is valid_loss, but the direction of optimisation is maximize in this example as you mentioned. I think we should set accuracy as the monitor value of FastAIV2PruningCallback.

I think the fastAI v1 example has the same issue.