Self-Distillation for Further Pre-training of Transformers

This is the official implementation of the paper Self-Distillation for Further Pre-training of Transformers.

Step 1. Download dataset

Download data.tar.gz from here and

tar zxvf data.tar.gz

Step 2. Further pre-training.

Run the following command. The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.

bash "GPU number" "dataset"

Step 3. Self distillation.

After further pre-training, run the following command for self-distillation. The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.

bash "GPU number" "dataset" 

Step 4.Fine-tuning from self-distilled model.

The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.

bash "GPU number" "dataset"

(Optional) Fine-tuning from further pre-trained model.

The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.

bash "GPU number" "dataset" false further-pretrain-20000

(Optional) Fine-tuning from pre-trained ViT without any further pre-training or self-distillation.

The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.

bash "GPU number" "dataset" True


To cite the code/data/paper, please use this BibTex.

title={Self-Distillation for Further Pre-training of Transformers},
author={Seanie Lee and Minki Kang and Juho Lee and Sung Ju Hwang and Kenji Kawaguchi},
booktitle={The Eleventh International Conference on Learning Representations },