loss: nan

Question

loss: nan

youngjae0129 opened this issue a year ago · 2 comments

Hi, I'm Youngjae Kim.
I'm doing an experiment with custom data.
After a certain point (e.g. Epoch 37 Iter 5), the loss is still coming out as nan.
Has this ever happened to you?
If so, how did you solve it?

Answer 1 · 2023-11-27T08:29:53.000Z

Hi, we did not observe the nan loss when conducting our experiments. However, this nan loss problem is a frequently encountered issue in machine learning experiments. You may leverage detect_anomaly technique of pytorch to locate the fundamental reason for this problem. Or you can re-test with different CUDA environments. Feel free to ask additional questions.

Answer 2 · 2024-05-14T08:01:58.000Z

In addition, another bug is reported. When idx_unchosen is empty, the loss calculation is nan. Perhaps it should be judged that when idx_unchosen is empty, the pseudo loss is 0.