DilateFormer

Official PyTorch implementation of IEEE Transaction on Multimedia 2023 paper “DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition” . [paper] [Project Page]

We currenent release the pytorch version code for:

  • ImageNet-1K training
  • ImageNet-1K pre-trained weights

ImageNet-1K pre-trained weights

Baidu Netdisk Link: [ckpt] Extracted code:q4mu

Google drive Link: [ckpt]

Image classification

Our repository is built base on the DeiT repository, but we add some useful features:

  1. Calculating accurate FLOPs and parameters with fvcore (see check_model.py).
  2. Auto-resuming.
  3. Saving best models and backup models.
  4. Generating training curve (see generate_tensorboard.py).

Installation

  • Install PyTorch 1.7.0+ and torchvision 0.8.1+

    conda install -c pytorch pytorch torchvision
  • Install other packages

    pip install timm==0.5.4
    pip install fvcore

Training

Simply run the training scripts as followed, and take dilateformer_tiny as example:

bash dist_train.sh dilateformer_tiny [other prams]

If the training was interrupted abnormally, you can simply rerun the script for auto-resuming. Sometimes the checkpoint may not be saved properly, you should set the resumed model via --reusme ${work_path}/ckpt/backup.pth.

Generate curves

You can generate the training curves as followed:

python3 generate_tensoboard.py

Note that you should install tensorboardX.

Calculating FLOPs and Parameters

You can calculate the FLOPs and parameters via:

python3 check_model.py

Acknowledgement

This repository is built using the timm library and the DeiT repository.

Citation

If you use this code for a paper, please cite:

DilateFormer

@article{jiao2023dilateformer,
title = {DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition},
author = {Jiao, Jiayu and Tang, Yu-Ming and Lin, Kun-Yu and Gao, Yipeng and Ma, Jinhua and Wang, Yaowei and Zheng, Wei-Shi},
journal = {{IEEE} Transaction on Multimedia},
year = {2023}
}