cannot reproduce the best result of single model

Question

cannot reproduce the best result of single model

cengzy14 opened this issue 6 years ago · 7 comments

I followed all the instructions and use the default hyperparameters, which should give me the best results. However, if I set random seed=1204 as default, I can only get 69.84 on test-dev split, which is 0.2 lower than the reported results. And I notice that the standard deviations reported on val split is around 0.11.
Can you give me some advice on how to fix the gap?
Thx!

Answer 1 · 2019-01-21T09:42:42.000Z

First of all, did you confirm with our pretrained model for the single best?
Did you maintain the batch size?

Answer 2 · 2019-01-22T01:56:15.000Z

Thank you for your timely advice!
I have checked the pretrained single best model and can get 70.04. My batch size is 256 as the default setting.

Answer 3 · 2019-01-27T04:07:02.000Z

I'm wondering there is something wrong with the default hyperparameters.
If I set seed=1204, I can only get 69.84 on test-dev split, and there is my log:

nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
epoch 0, time: 3231.65
train_loss: 6.23, norm: 12.0518, score: 40.71
gradual warmup lr: 0.0007
epoch 1, time: 3157.82
train_loss: 3.33, norm: 4.1553, score: 51.09
gradual warmup lr: 0.0010
epoch 2, time: 3149.84
train_loss: 3.05, norm: 2.6164, score: 55.29
gradual warmup lr: 0.0014
epoch 3, time: 3163.43
train_loss: 2.87, norm: 1.8427, score: 58.06
lr: 0.0014
epoch 4, time: 3164.04
train_loss: 2.70, norm: 1.4510, score: 60.73
lr: 0.0014
epoch 5, time: 3148.83
train_loss: 2.57, norm: 1.2653, score: 62.84
lr: 0.0014
epoch 6, time: 3155.81
train_loss: 2.47, norm: 1.1613, score: 64.59
lr: 0.0014
epoch 7, time: 3197.79
train_loss: 2.38, norm: 1.1030, score: 66.20
lr: 0.0014
epoch 8, time: 3177.89
train_loss: 2.29, norm: 1.0696, score: 67.63
lr: 0.0014
epoch 9, time: 3176.49
train_loss: 2.22, norm: 1.0529, score: 68.99
decreased lr: 0.0003
epoch 10, time: 3193.20
train_loss: 2.02, norm: 1.0121, score: 72.29
lr: 0.0003
epoch 11, time: 3201.57
train_loss: 1.95, norm: 1.0404, score: 73.58
decreased lr: 0.0001
epoch 12, time: 3208.42
train_loss: 1.88, norm: 1.0369, score: 74.85

which is also lower than the given log.

And I notice the given log seems to set seed=204, so I change my seed to 204. and get 69.84 on test-dev split, still lower. and here is my log, which is closer to the given log:

nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
epoch 0, time: 3295.17
train_loss: 6.38, norm: 12.1419, score: 39.25
gradual warmup lr: 0.0007
epoch 1, time: 3236.02
train_loss: 3.38, norm: 4.1166, score: 50.38
gradual warmup lr: 0.0010
epoch 2, time: 8882.89
train_loss: 3.06, norm: 2.5824, score: 54.96
gradual warmup lr: 0.0014
epoch 3, time: 6159.07
train_loss: 2.88, norm: 1.8257, score: 57.88
lr: 0.0014
epoch 4, time: 3240.29
train_loss: 2.71, norm: 1.4380, score: 60.66
lr: 0.0014
epoch 5, time: 3232.00
train_loss: 2.57, norm: 1.2548, score: 62.79
lr: 0.0014
epoch 6, time: 3219.37
train_loss: 2.47, norm: 1.1558, score: 64.58
lr: 0.0014
epoch 7, time: 3238.07
train_loss: 2.37, norm: 1.0985, score: 66.28
lr: 0.0014
epoch 8, time: 3255.24
train_loss: 2.29, norm: 1.0676, score: 67.72
lr: 0.0014
epoch 9, time: 3249.98
train_loss: 2.21, norm: 1.0496, score: 69.17
decreased lr: 0.0003
epoch 10, time: 3212.20
train_loss: 2.01, norm: 1.0072, score: 72.51
lr: 0.0003
epoch 11, time: 3235.89
train_loss: 1.94, norm: 1.0362, score: 73.77
decreased lr: 0.0001
epoch 12, time: 3240.93
train_loss: 1.87, norm: 1.0323, score: 75.03

Has anyone encountered this problem? Is there some advice?

Answer 4 · 2019-01-27T09:01:34.000Z

The seed is 1204. Could you check with the PyTorch version of 0.3.1?

Answer 5 · 2019-01-31T18:42:24.000Z

I have also encounter the same issue. My PyTorch version is 0.4.1. Do you think that different PyTorch version might be the issue?

Answer 6 · 2019-02-01T01:34:04.000Z

I have also encounter the same issue. My PyTorch version is 0.4.1. Do you think that different PyTorch version might be the issue?

My PyTorch version is 0.3.1. So I don't think it's the issue of PyTorch version. Did you get the same result as mine?

Answer 7 · 2019-03-11T00:00:49.000Z

Sorry for the late response. @cengzy14 that may be in the range of the model variance to random seeds, though the standard deviation is around +-0.1%. The exact reproduction is subject to your GPUs. For the model, we used 4 Titan Xs (Not Xps). We selected the model based on test-dev results.