/Multi-teachers-Knowledge-Distillation

Distilling knowledge from ensemble of multiple teacher networks to student network with multiple heads

Primary LanguageJupyter Notebook

Multi-teachers-Knowledge-Distillation

Distilling knowledge from ensemble of multiple teacher networks to student network with multi-head

Reference Papers:

  1. Hydra: Preserving Ensemble Diversity for Model Distillation (https://arxiv.org/abs/2001.04694)
  2. Distilling the Knowledge in a Neural Network (https://arxiv.org/abs/1503.02531)

Reference Implementations:

  1. Keras Resnet Implementations (https://keras.io/examples/cifar10_resnet)
  2. Resnet Training Procedures from Ko Ye Yint Htoon (https://github.com/yeyinthtoon/Knowledge-Distillation-ResNet)
  3. Part of Knowledge Distillation Procedure from (https://devopedia.org/knowledge-distillation)

Dataset: CIFAR-10