Loss becomes nan after ~200 steps.
Thien223 opened this issue · 2 comments
Thien223 commented
Thank you for your work.
I have an issue, after ~200 training steps, the loss becomes nan as following:
loss=nan, log_p=nan, logdet=nan]922]t=5.16016]
is this normal, do you have any experience to fix this issue? Thank you so much
ryhorv commented
Hi. Thanks for the question.
I would recommend you train model from a dev
branch and use float32 dtype and scale = 1. This problem occurs in the ActNorm layer and appears only in the early steps of training. Just try to start training again.
Thien223 commented
Thank for your quick reply.
I'm not sure about what dev
branch is, but I have changed the dtype to float32 in hparams, and the scale to 1.
It seems to be OK now.
Thank you.