yzd-v/cls_KD

loss is always nan

Closed this issue · 1 comments

When I use my own dataset for distillation training, loss is always nan, what is the reason?

yzd-v commented

When log and softmax are separate, the latest torch will cause this problem. You can refer our new implementation of NKD to modify the loss.