Paper: Revisit Knowledge Distillation: A Teacher Free Framework

Question

Paper: Revisit Knowledge Distillation: A Teacher Free Framework

NeelayS opened this issue 5 years ago · 0 comments

Paper Link: https://arxiv.org/abs/1909.11723

Description

Proposes that one of the major reasons KD works is because of the presence of soft logits for the student to learn from instead of hard labels. Builds upon this to come up with two teacher-free frameworks - 

1. Virtual teacher - Generates soft labels during student trainig with a high probability(~0.9) given to 
                              the correct class and the rest distributed uniformly amongst the remaining 
                              classes.

2. Self training - Trains a copy of the student itself to be the teacher. Uses this copy to perform 
                           regular KD on the student.