Validation accuracy keeps to be 0.09% during training

Question

Validation accuracy keeps to be 0.09% during training

edizhuang opened this issue 3 years ago · 3 comments

Dear authors,

I'm interested in your paper and perfom training from scratch on ImageNet. However, the validation accuracy keeps to be * Acc@1 0.090 during training.

Do you have any idea why this happens? I train Swin Transformer, it works.

I use Pytorch 1.7.1 and 1.6.0, no mixed precision, 100 epochs.

--amp-opt-level O0 --output ./output --opts TRAIN.EPOCHS 100

Thanks,
Eddie

Answer 1 · 2021-10-23T02:52:03.000Z

Thanks for your feedback.

It is recommended to use mixed precision and train the model for 300 epochs, i.e., --amp-opt-level native --opts TRAIN.EPOCHS 300 (It looks that there are only these two differences between your training command and ours).

You may wish to provide more details about the exact command you run and the training log, and maybe I could give more advice with more details :)

Answer 2 · 2021-10-24T02:50:08.000Z

Hi cheerss,

It does work with --amp-opt-level native. The first epoch is * Acc@1 1.488.

It is so wired that only mixed precision works. Just let you know.

Thanks,
Eddie

Answer 3 · 2021-10-26T13:25:33.000Z

It seems that there is a bug when training in O0 mode. We have fixed it and the program works well, now.

Thanks for your feedback.