This is the official implementation of the paper Self-Distillation for Further Pre-training of Transformers.
Download data.tar.gz from here and
tar zxvf data.tar.gz
Run the following command. The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.
bash run_pretrain.sh "GPU number" "dataset"
After further pre-training, run the following command for self-distillation. The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.
bash run_selftrain.sh "GPU number" "dataset"
The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.
bash run_finetune.sh "GPU number" "dataset"
The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.
bash run_finetune.sh "GPU number" "dataset" false further-pretrain-20000
The dataset can be either aircraft, chest_xray, cub, dtd, stanford_dogs or vgg_flower_102.
bash run_finetune.sh "GPU number" "dataset" True
To cite the code/data/paper, please use this BibTex.
@inproceedings{
lee2023selfdistillation,
title={Self-Distillation for Further Pre-training of Transformers},
author={Seanie Lee and Minki Kang and Juho Lee and Sung Ju Hwang and Kenji Kawaguchi},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=kj6oK_Hj40}
}