loss is always nan

Question

Closed this issue a year ago · 1 comments

When I use my own dataset for distillation training, loss is always nan, what is the reason?

Answer 1 · 2023-08-30T03:30:02.000Z

When log and softmax are separate, the latest torch will cause this problem. You can refer our new implementation of NKD to modify the loss.