yzd-v/cls_KD

Questions on masked area

jwfanDL opened this issue · 0 comments

Hi Zhendong,
In ViTKD, we only distill the knowledge from unmasked area, while full area in MGD.
My questions are:

  1. Why ViTKD only distill the knowledge only from unmasked area
  2. What is the difference and relationship between unmasked and masked area in distillation.