[NeurIPS 2022] VTC-LFC: Vision Transformer Compression with Low-Frequency Components

This repository contains PyTorch evaluation code, training code and pretrained models for the following projects:

VTC-LFC (Vision Transformer Compression with Low-Frequency Components), NeurIPS 2022

For details see VTC-LFC: Vision Transformer Compression with Low-Frequency Components by Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, and Hao Li.

Pipeline

Model Zoo

We provide compressed DeiT models fine-tuned on ImageNet 2012.

name	acc@1	acc@5	#FLOPs	#params	url
DeiT-tiny	71.6	90.7	0.7G	5.1M	model
DeiT-small	79.4	94.6	2.1G	17.7M	model
DeiT-base	81.3	95.3	7.5G	63.5M	model

We also provide the pruned models without fine-tuning.

name	#FLOPs	#params	url
DeiT-tiny w/o fine-tuning	0.7G	5.1M	model
DeiT-small w/o fine-tuning	2.1G	17.7M	model
DeiT-base w/o fine-tuning	7.5G	63.5M	model

Usage

First, clone the repository locally:

git clone

Then, install requirements:

pip install -r -requriments.txt
(we use /torch 1.7.1 /torchvision 0.8.2 /timm 0.3.4/ fvcore 0.1.5 /cuda 11.0 / 16G or 32G V100 for training and evaluation.
Note that we use torch.cuda.amp to accelerate speed of training which requires pytorch >=1.6)

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Pretrained models

You can download the official pretrained models from here.

name	acc@1	acc@5	#params	url
DeiT-tiny	72.2	91.1	5M	model
DeiT-small	79.9	95.0	22M	model
DeiT-base	81.8	95.6	86M	model

Pruning

To pruning a pre-trained DeiT model on ImageNet:

python -m torch.distributed.launch --master_port 29500 --nproc_per_node=2 --use_env ./pruning/pruning_py/pruning_deit.py \
--device cuda \
--data-set IMNET \
--batch-size 128 \
--num-classes 1000 \
--dist-eval \
--model deit_tiny_cfged_patch16_224 \
--data-path /path/to/imagenet \
--resume /path/to/pretrained model \
--prune-part qkv,fc1 \
--prune-criterion lfs \
--prune-block-id 0,1,2,3,4,5,6,7,8,9,10,11 \
--cutoff-channel 0.1 \
--cutoff-token 0.85 \
--lfs-lambda 0.1 \
--num-samples 2000 \
--keep-qk-scale \
--prune-mode bcp \
--allowable-drop 9.5 \
--drop-for-token 0.56 \
--prune --prune-only \
--output-dir /path/to/save

Fine-tuning

To fine-tuning a pruned DeiT model on ImageNet:

python -m torch.distributed.launch --master_port 29500 --nproc_per_node=8 --use_env ./pruning/pruning_py/pruning_deit.py \
--device cuda \
--epochs 300 \
--lr 0.0001 \
--data-set IMNET \
--num-classes 1000 \
--dist-eval \
--model deit_tiny_cfged_patch16_224 \
--batch-size 512 \
--teacher-model deit_tiny_cfged_patch16_224 \
--distillation-alpha 0.1 \
--distillation-type hard \
--warmup-epochs 0 \
--keep-qk-scale \
--finetune-only \
--output-dir /path/to/save \
--data-path /path/to/imagenet \
--teacher-path /path/to/original model  \
--resume /path/to/pruned model

Citation

If you use this code for a paper please cite:

@inproceedings{
wang2022vtclfc,
title={{VTC}-{LFC}: Vision Transformer Compression with Low-Frequency Components},
author={Zhenyu Wang and Hao Luo and Pichao WANG and Feng Ding and Fan Wang and Hao Li},
booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
year={2022},
url={https://openreview.net/forum?id=HuiLIB6EaOk}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests!

Daner-Wang/VTC-LFC