ZJULearning/resa

fp16 training loss=nan

Opened this issue · 2 comments

hi, thank you for your work!
I have encountered a problem when I set fp16 training loss is always nan. then i found in resa module, after down, up, right and left feature fusion, the feature value become very large, and many values are larger than 65504, so the actually value becomes inf. How can I achieve mixed precision(fp16) training without losing too much performance?

I have trid some method, add BN to conv in resa module,
image
but no lanes detected.
if I decrease the value of alpha, i.e, alpha=0.1, or change the act (original is relu) to sigmoid or tanh, .., it will losing too much performance?

expecting your reply.

@ilaij0810
I have the same problem of loss=nan. If I set alpha to 1.0, I have not got the loss of nan. But, the training can not coverge.
Do you have any solutions?