Justherozen/ProMix

loss: nan

youngjae0129 opened this issue · 2 comments

Hi, I'm Youngjae Kim.
I'm doing an experiment with custom data.
After a certain point (e.g. Epoch 37 Iter 5), the loss is still coming out as nan.
Has this ever happened to you?
If so, how did you solve it?

Hi, we did not observe the nan loss when conducting our experiments. However, this nan loss problem is a frequently encountered issue in machine learning experiments. You may leverage detect_anomaly technique of pytorch to locate the fundamental reason for this problem. Or you can re-test with different CUDA environments. Feel free to ask additional questions.

In addition, another bug is reported. When idx_unchosen is empty, the loss calculation is nan. Perhaps it should be judged that when idx_unchosen is empty, the pseudo loss is 0.