Paper: Revisit Knowledge Distillation: A Teacher Free Framework
NeelayS opened this issue · 0 comments
NeelayS commented
- Paper Link: https://arxiv.org/abs/1909.11723
Description
Proposes that one of the major reasons KD works is because of the presence of soft logits for the student to learn from instead of hard labels. Builds upon this to come up with two teacher-free frameworks -
1. Virtual teacher - Generates soft labels during student trainig with a high probability(~0.9) given to
the correct class and the rest distributed uniformly amongst the remaining
classes.
2. Self training - Trains a copy of the student itself to be the teacher. Uses this copy to perform
regular KD on the student.