About auto mixed precision training

Question

About auto mixed precision training

zhangliang-04 opened this issue 3 years ago · 1 comments

Hi,
There are mixed precision related arguments (like --fp16, --fp16_opt_level) in main_pretrain.py. But it seems that they are not used. I have tried apex.amp to do mixed precision, and found it works well in the pretraining Stage I (Nearly doubled the speed). But in Stage II, the gradient occurred nan always. Have you ever had a similar problem? How could this occurred? torch.cuda.amp have this issue too.

Answer 1 · 2021-09-07T09:18:03.000Z

Hi @zhangliang-04，we did not use --fp16 in this paper. I tested on fp16 training before but failed, and I have no idea about this problem. So sorry for that.