/LightViT

Official implementation for paper "LightViT: Towards Light-Weight Convolution-Free Vision Transformers"

Primary LanguagePythonApache License 2.0Apache-2.0

LightViT

Official implementation for paper "LightViT: Towards Light-Weight Convolution-Free Vision Transformers".

By Tao Huang, Lang Huang, Shan You, Fei Wang, Chen Qian, Chang Xu.

Updates

July 26, 2022

Code for COCO detection was released.

July 14, 2022

Code for ImageNet training was released.

Introduction

mask

Results on ImageNet-1K

model resolution acc@1 acc@5 #params FLOPs ckpt log
LightViT-T 224x224 78.7 94.4 9.4M 0.7G google drive log
LightViT-S 224x224 80.9 95.3 19.2M 1.7G google drive log
LightViT-B 224x224 82.1 95.9 35.2M 3.9G google drive log

Preparation

  1. Clone training code

    git clone https://github.com/hunto/LightViT.git --recurse-submodules
    cd LightViT/classification

    The code of LightViT model can be found in lib/models/lightvit.py .

  2. Requirements

    torch>=1.3.0
    # if you want to use torch.cuda.amp for mixed-precision training, the lowest torch version is 1.5.0
    timm==0.5.4
  3. Prepare your datasets following this link.

Evaluation

You can evaluate our results using the provided checkpoints. First download the checkpoints into your machine, then run

sh tools/dist_run.sh tools/test.py ${NUM_GPUS} configs/strategies/lightvit/config.yaml timm_lightvit_tiny --drop-path-rate 0.1 --experiment lightvit_tiny_test --resume ${ckpt_file_path}

Train from scratch on ImageNet-1K

sh tools/dist_train.sh 8 configs/strategies/lightvit/config.yaml ${MODEL} --drop-path-rate 0.1 --experiment lightvit_tiny

${MODEL} can be timm_lightvit_tiny, timm_lightvit_small, timm_lightvit_base .

For timm_lightvit_base, we added --amp option to use mixed-precision training, and set drop_path_rate to 0.3.

Throughput

sh tools/dist_run.sh tools/speed_test.py 1 configs/strategies/lightvit/config.yaml ${MODEL} --drop-path-rate 0.1 --batch-size 1024

or

python tools/speed_test.py -c configs/strategies/lightvit/config.yaml --model ${MODEL} --drop-path-rate 0.1 --batch-size 1024

Results on COCO

We conducted experiments on COCO object detection & instance segmentation tasks, see detection/README.md for details.

License

This project is released under the Apache 2.0 license.

Citation

@article{huang2022lightvit,
  title = {LightViT: Towards Light-Weight Convolution-Free Vision Transformers},
  author = {Huang, Tao and Huang, Lang and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
  journal = {arXiv preprint arXiv:2207.05557},
  year = {2022}
}