/CTKD

[AAAI 2023] Code for "Curriculum Temperature for Knowledge Distillation"

Primary LanguagePython

Curriculum Temperature for Knowledge Distillation

This repo is the official PyTorch implementation for "Curriculum Temperature for Knowledge Distillation" (AAAI 2023) https://arxiv.org/abs/2211.16231

Detailed Chinese interpretation is available on a Chinese forum Zhihu: https://zhuanlan.zhihu.com/p/595735843.

Abstract

CTKD organizes the distillation task from easy to hard through a dynamic and learnable temperature. The temperature is learned during the student’s training process with a reversed gradient that aims to maximize the distillation loss (i.e., increase the learning difficulty) between teacher and student in an adversarial manner.

As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing state-of-the-art knowledge distillation frameworks and brings general improvements at a negligible additional computation cost.

Framework

(a) We introduce a learnable temperature module that predicts a suitable temperature τ for distillation. The gradient reversal layer is proposed to reverse the gradient of the temperature module during the backpropagation.

(b) Following the easy-to-hard curriculum, we gradually increase the parameter λ, leading to increased learning difficulty w.r.t. temperature for the student.

Visualization

The learning curves of temperature during training:

Main Results

On CIFAR-100:

Teacher
Student
RN-56
RN-20
RN-110
RN-32
RN-110
RN-20
WRN-40-2
WRN-16-2
WRN-40-2
WRN-40-1
VGG-13
VGG-8
KD 70.66 73.08 70.66 74.92 73.54 72.98
+CTKD 71.19 73.52 70.99 75.45 73.93 73.52

On ImageNet-2012:

Teacher
(RN-34)
Student
(RN-18)
KD +CTKD
Top-1 73.96 70.26 70.83 71.32
Top-5 91.58 89.50 90.31 90.27

Requirements

  • Python 3.8
  • Pytorch 1.11.0
  • Torchvision 0.12.0

Running

  1. Download the pretrained teacher models and put them to ./save/models.
Dataset Download
CIFAR teacher models [Baidu Yun]
ImageNet teacher models [Baidu Yun]

If you want to train your teacher model, please consider using ./scripts/run_cifar_vanilla.sh or ./scripts/run_imagenet_vanilla.sh.

After the training process, put your teacher model to ./save/models.

  1. Training on CIFAR-100:
  • Download the dataset and change the path in ./dataset/cifar100.py line 27 to your current dataset path.
  • Modify the script scripts/run_cifar_distill.sh according to your needs.
  • Run the script.
    sh scripts/run_cifar_distill.sh  
  1. Training on ImageNet-2012:
  • Download the dataset and change the path in ./dataset/imagenet.py line 21 to your current dataset path.
  • Modify the script scripts/run_imagenet_distill.sh according to your needs.
  • Run the script.
    sh scripts/run_imagenet_distill.sh  

Model Zoo

We provide complete training configs, logs, and models for your reference.

CIFAR-100:

  • Combing CTKD with existing KD methods, including vanilla KD, PKT, SP, VID, CRD, SRRL, and DKD.
    (Teacher: RN-56, Student: RN-20)
    [Baidu Yun] [Google]

Citation

If this repo is helpful for your research, please consider citing our paper:

@inproceedings{li2023curriculum,
  title={Curriculum temperature for knowledge distillation},
  author={Li, Zheng and Li, Xiang and Yang, Lingfeng and Zhao, Borui and Song, Renjie and Luo, Lei and Li, Jun and Yang, Jian},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={37},
  number={2},
  pages={1504--1512},
  year={2023}
}

For any questions, please contact me via email (zhengli97@mail.nankai.edu.cn).