About auto mixed precision training
zhangliang-04 opened this issue · 1 comments
zhangliang-04 commented
Hi,
There are mixed precision related arguments (like --fp16
, --fp16_opt_level
) in main_pretrain.py
. But it seems that they are not used. I have tried apex.amp
to do mixed precision, and found it works well in the pretraining Stage I (Nearly doubled the speed). But in Stage II, the gradient occurred nan
always. Have you ever had a similar problem? How could this occurred? torch.cuda.amp
have this issue too.
ArrowLuo commented
Hi @zhangliang-04,we did not use --fp16
in this paper. I tested on fp16 training before but failed, and I have no idea about this problem. So sorry for that.