
[ICLR 2024 poster] Efficient Modulation for Vision Networks

Primary LanguagePythonApache License 2.0Apache-2.0

[Efficient Modulation for Vision Networks] (ICLR 2024)

News & TODO & Updates:

  • will improve the performance with better training recipe.
  • Simplify model by moving unnecessary settings and renaming the classes to ease understanding.
  • Upload benchmark script to ease latency benchmark.

Image Classification

1. Requirements

torch>=1.7.0; torchvision>=0.8.0; pyyaml; timm==0.6.13;

data prepare: ImageNet with the following folder structure, you can extract ImageNet by this script.

│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. Pre-trained Context Cluster Models

We upload the checkpoints with distillation and logs to google drive. Feel free to download.

Model #params Image resolution Top1 Acc Download
EfficientMod-xxs 4.7M 224 77.1 [checkpoint & logs]
EfficientMod-xs 6.6M 224 79.4 [checkpoint & logs]
EfficientMod-s 12.9M 224 81.9 [checkpoint & logs]
EfficientMod-s-Conv (No Distill.) 12.9M 224 80.5 [checkpoint & logs]

3. Validation

To evaluate our EfficientMod models, run:

python3 validate.py /path/to/imagenet  --model {model} -b 256 --checkpoint {/path/to/checkpoint} 

4. Train

We show how to train EfficientMod on 8 GPUs.

python3 -m torch.distributed.launch --nproc_per_node=8 train.py --data {path-to-imagenet} --model {model} -b 256 --lr 4e-3 --amp --model-ema --distillation-type soft --distillation-tau 1 --auto-resume --exp_tag {experiment_tag}

See folder detection for Detection and instance segmentation tasks on COCO..

See folder segmentation for Semantic Segmentation task on ADE20K.


    title={Efficient Modulation for Vision Networks},
    author={Xu Ma and Xiyang Dai and Jianwei Yang and Bin Xiao and Yinpeng Chen and Yun Fu and Lu Yuan},
    booktitle={The Twelfth International Conference on Learning Representations},