ryhorv/tf-flowavenet

Loss becomes nan after ~200 steps.

Thien223 opened this issue · 2 comments

Thank you for your work.

I have an issue, after ~200 training steps, the loss becomes nan as following:

loss=nan, log_p=nan, logdet=nan]922]t=5.16016]

is this normal, do you have any experience to fix this issue? Thank you so much

Hi. Thanks for the question.
I would recommend you train model from a dev branch and use float32 dtype and scale = 1. This problem occurs in the ActNorm layer and appears only in the early steps of training. Just try to start training again.

Thank for your quick reply.

I'm not sure about what dev branch is, but I have changed the dtype to float32 in hparams, and the scale to 1.

It seems to be OK now.

Thank you.