nanxiaotong/STKD
Knowledge distillation (KD) transfers discriminative knowledge from a large and complex model (known as Teacher) to a smaller and faster one (known as Student). Existing advanced knowledge distillation methods, limited to fixed feature extraction paradigms that capture teacher’s structure knowledge to guide the training of the student, often fail to obtain comprehensive knowledge to the student. Toward this end, in this paper, we propose a new approach, Synchronous Teaching Knowledge Distillation (STKD), to integrate online teaching and offline teaching for transferring rich and comprehensive knowledge to the student.
Python