microsoft/UniVL

About auto mixed precision training

zhangliang-04 opened this issue · 1 comments

Hi,
There are mixed precision related arguments (like --fp16, --fp16_opt_level) in main_pretrain.py. But it seems that they are not used. I have tried apex.amp to do mixed precision, and found it works well in the pretraining Stage I (Nearly doubled the speed). But in Stage II, the gradient occurred nan always. Have you ever had a similar problem? How could this occurred? torch.cuda.amp have this issue too.

Hi @zhangliang-04,we did not use --fp16 in this paper. I tested on fp16 training before but failed, and I have no idea about this problem. So sorry for that.