cannot reproduce the best result of single model
cengzy14 opened this issue · 7 comments
I followed all the instructions and use the default hyperparameters, which should give me the best results. However, if I set random seed=1204 as default, I can only get 69.84 on test-dev split, which is 0.2 lower than the reported results. And I notice that the standard deviations reported on val split is around 0.11.
Can you give me some advice on how to fix the gap?
Thx!
First of all, did you confirm with our pretrained model for the single best?
Did you maintain the batch size?
Thank you for your timely advice!
I have checked the pretrained single best model and can get 70.04. My batch size is 256 as the default setting.
I'm wondering there is something wrong with the default hyperparameters.
If I set seed=1204
, I can only get 69.84 on test-dev split, and there is my log:
nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
epoch 0, time: 3231.65
train_loss: 6.23, norm: 12.0518, score: 40.71
gradual warmup lr: 0.0007
epoch 1, time: 3157.82
train_loss: 3.33, norm: 4.1553, score: 51.09
gradual warmup lr: 0.0010
epoch 2, time: 3149.84
train_loss: 3.05, norm: 2.6164, score: 55.29
gradual warmup lr: 0.0014
epoch 3, time: 3163.43
train_loss: 2.87, norm: 1.8427, score: 58.06
lr: 0.0014
epoch 4, time: 3164.04
train_loss: 2.70, norm: 1.4510, score: 60.73
lr: 0.0014
epoch 5, time: 3148.83
train_loss: 2.57, norm: 1.2653, score: 62.84
lr: 0.0014
epoch 6, time: 3155.81
train_loss: 2.47, norm: 1.1613, score: 64.59
lr: 0.0014
epoch 7, time: 3197.79
train_loss: 2.38, norm: 1.1030, score: 66.20
lr: 0.0014
epoch 8, time: 3177.89
train_loss: 2.29, norm: 1.0696, score: 67.63
lr: 0.0014
epoch 9, time: 3176.49
train_loss: 2.22, norm: 1.0529, score: 68.99
decreased lr: 0.0003
epoch 10, time: 3193.20
train_loss: 2.02, norm: 1.0121, score: 72.29
lr: 0.0003
epoch 11, time: 3201.57
train_loss: 1.95, norm: 1.0404, score: 73.58
decreased lr: 0.0001
epoch 12, time: 3208.42
train_loss: 1.88, norm: 1.0369, score: 74.85
which is also lower than the given log.
And I notice the given log seems to set seed=204
, so I change my seed to 204. and get 69.84 on test-dev split, still lower. and here is my log, which is closer to the given log:
nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
epoch 0, time: 3295.17
train_loss: 6.38, norm: 12.1419, score: 39.25
gradual warmup lr: 0.0007
epoch 1, time: 3236.02
train_loss: 3.38, norm: 4.1166, score: 50.38
gradual warmup lr: 0.0010
epoch 2, time: 8882.89
train_loss: 3.06, norm: 2.5824, score: 54.96
gradual warmup lr: 0.0014
epoch 3, time: 6159.07
train_loss: 2.88, norm: 1.8257, score: 57.88
lr: 0.0014
epoch 4, time: 3240.29
train_loss: 2.71, norm: 1.4380, score: 60.66
lr: 0.0014
epoch 5, time: 3232.00
train_loss: 2.57, norm: 1.2548, score: 62.79
lr: 0.0014
epoch 6, time: 3219.37
train_loss: 2.47, norm: 1.1558, score: 64.58
lr: 0.0014
epoch 7, time: 3238.07
train_loss: 2.37, norm: 1.0985, score: 66.28
lr: 0.0014
epoch 8, time: 3255.24
train_loss: 2.29, norm: 1.0676, score: 67.72
lr: 0.0014
epoch 9, time: 3249.98
train_loss: 2.21, norm: 1.0496, score: 69.17
decreased lr: 0.0003
epoch 10, time: 3212.20
train_loss: 2.01, norm: 1.0072, score: 72.51
lr: 0.0003
epoch 11, time: 3235.89
train_loss: 1.94, norm: 1.0362, score: 73.77
decreased lr: 0.0001
epoch 12, time: 3240.93
train_loss: 1.87, norm: 1.0323, score: 75.03
Has anyone encountered this problem? Is there some advice?
The seed is 1204. Could you check with the PyTorch version of 0.3.1?
I have also encounter the same issue. My PyTorch version is 0.4.1. Do you think that different PyTorch version might be the issue?
I have also encounter the same issue. My PyTorch version is 0.4.1. Do you think that different PyTorch version might be the issue?
My PyTorch version is 0.3.1. So I don't think it's the issue of PyTorch version. Did you get the same result as mine?
Sorry for the late response. @cengzy14 that may be in the range of the model variance to random seeds, though the standard deviation is around +-0.1%. The exact reproduction is subject to your GPUs. For the model, we used 4 Titan Xs (Not Xps). We selected the model based on test-dev results.