lucidrains/vit-pytorch

Questions about distill_loss

haoren55555 opened this issue · 1 comments

sorry to bother, I see the distill_loss in distill.py as :
distill_loss = F.kl_div(
F.log_softmax(distill_logits / T, dim=-1),
F.softmax(teacher_logits / T, dim=-1).detach(),
reduction='batchmean')
I wonder why the teacher part uses the softmax function rather than log_softmax one, thanks.

@haoren55555 yeah, you could do log softmax for teacher too by setting log_target = True https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html . i'm just rolling with what pytorch offers