MAE finetune train loss nan
CodingMice opened this issue ยท 5 comments
info : Loss is nan, stopping training.
Hey @CodingMice ,
It might be due to the amp.autocast() . Disable it via amp.autocast(enabled=False) solves my problem.
It would be great if more context is provided here. There could be multiple ways the Loss goes to NaN, and amp can indeed be one of them.
FYI - This PyTorch issue thread with a long history could be a hint...
pytorch/pytorch#40497
And here's troubleshooting for the issue (also suggested in the thread):
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html#loss-is-inf-nan
Anyway, a quick fix would be as commented by @Jeff-LiangF.
I've faced the same issue.
Remarkably, using gradient clipping has solved the issue + improved the results.
I've faced the same issue. Remarkably, using gradient clipping has solved the issue + improved the results.
how to set the value of gradient clipping? 0.1?