snumprlab/cl-alfred

What is distillation loss in the code used for ?

Closed this issue · 2 comments

I notice that the distillation loss in your code is not mentioned in your paper. So what are they used for ?

Hi @wqshmzh,

The distillation loss is $\alpha \mathbb{E}_{(x,z)\sim\mathcal{M}}[||z - \pi (x)||^2_2]$ of Equation 3 in our paper, which corresponds to L153-L160. The current model (i.e., learning the current task) is trained with the distillation loss such that the generated logits get close to ones generated from the previous models (i.e., learning the previous tasks).

OK. Got it. Sorry my bad !