MAE finetune train loss nan

Question

CodingMice opened this issue 3 years ago · 5 comments

info : Loss is nan, stopping training.

Answer 1 · 2022-02-18T01:45:17.000Z

It might be due to the amp.autocast() . Disable it via amp.autocast(enabled=False) solves my problem.

Answer 2 · 2022-03-03T19:45:09.000Z

It would be great if more context is provided here. There could be multiple ways the Loss goes to NaN, and amp can indeed be one of them.

Answer 3 · 2022-07-28T23:52:42.000Z

FYI - This PyTorch issue thread with a long history could be a hint...
pytorch/pytorch#40497

Anyway, a quick fix would be as commented by @Jeff-LiangF.

Answer 4 · 2022-11-25T12:40:37.000Z

I've faced the same issue.
Remarkably, using gradient clipping has solved the issue + improved the results.

Answer 5 · 2023-08-28T08:47:49.000Z

I've faced the same issue. Remarkably, using gradient clipping has solved the issue + improved the results.

how to set the value of gradient clipping? 0.1?