loss is always nan
Closed this issue · 1 comments
July-zh commented
When I use my own dataset for distillation training, loss is always nan, what is the reason?
yzd-v commented
When log and softmax are separate, the latest torch will cause this problem. You can refer our new implementation of NKD to modify the loss.