
Reproduce PyTorch implementation of the paper "Two-Stream Transformer for Multi-Label Image Classification" ACM MM 2022

Primary LanguagePython

Two-Stream Transformer for Multi-Label Image Classification


Reproduce PyTorch implementation of the paper "Two-Stream Transformer for Multi-Label Image Classification" ACM MM 2022 paper alt tsformer

Data Preparation

  1. Download dataset and organize them as follow:
|---- MSCOCO
|---- NUS-WIDE
|---- VOC2007
  1. Preprocess using following commands:
python scripts/mscoco.py
python scripts/nuswide.py
python scripts/voc2007.py
python embedding.py --data [mscoco, nuswide, voc2007]


torch >= 1.9.0
torchvision >= 0.10.0


One can use following commands to train model.

python train.py --data mscoco --batch_size 16 --optimizer AdamW --lr 0.00001 --mode part --start_depth 9
python train.py --data nuswide --batch_size 16 --optimizer AdamW --lr 0.00001 --mode part --start_depth 1
python train.py --data voc2007 --batch_size 16 --optimizer AdamW --lr 0.00001 --mode part --start_depth 4


Pre-trained weights can be found in google drive. Download and put them in the experiments folder, then one can use follow commands to reproduce results reported in paper.

python evaluate.py --exp experiments/TSFormer_mscoco/exp1    # Microsoft COCO
python evaluate.py --exp experiments/TSFormer_nuswide/exp1   # NUS-WIDE
python evaluate.py --exp experiments/TSFormer_voc2007/exp1   # Pascal VOC 2007

Main Results

dataset mAP ours baseline baseline(our,large learning rate)
VOC 2007 97.0 97.0 95.5 95.72
MS-COCO 88.9 88.9 85.9 87.04
NUS-WIDE 69.3 69.3 66.2 67.35


  title={Two-Stream Transformer for Multi-Label Image Classification},
  author={Zhu, Xuelin and Cao, Jiuxin and Ge, Jiawei and Liu, Weijia and Liu, Bo},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},