haitongli/knowledge-distillation-pytorch

About "reduction" built in KLDivLoss

junfish opened this issue · 0 comments

The reason why your temperature is bigger than the original paper setting (said T = 2) may be caused by KLDivLoss. You may try to set reduction = "batchmean" in KLDivLoss. Just a guess. Welcome others to discuss.