fp16 training loss=nan
Opened this issue · 2 comments
ilaij0810 commented
hi, thank you for your work!
I have encountered a problem when I set fp16 training loss is always nan. then i found in resa module, after down, up, right and left feature fusion, the feature value become very large, and many values are larger than 65504, so the actually value becomes inf. How can I achieve mixed precision(fp16) training without losing too much performance?
ilaij0810 commented
2696120622 commented
@ilaij0810
I have the same problem of loss=nan. If I set alpha to 1.0, I have not got the loss of nan. But, the training can not coverge.
Do you have any solutions?