Fast convergence of detr with spatially modulated co-attention

Usage

There are no extra compiled components in SMCA DETR and package dependencies are minimal, so the code is very simple to use. We provide instructions how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/facebookresearch/detr.git

Then, install PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

(optional) to work with panoptic install panopticapi:

pip install git+https://github.com/cocodataset/panopticapi.git

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

To train Single Scale SMCA on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 2 --lr_drop 40 --num_queries 300 --epochs 50 --dynamic_scale type3 --output_dir smca_single_scale

A single epoch takes 30 minutes, so 50 epoch training takes around 25 hours on a single machine with 8 V100 cards.

	name	backbone	schedule	box AP
0	SMCA(single scale)	R50	50	41.0
1	SMCA-Container(single scale)	Container-S-Light	50	44.2
2	SMCA-Container(single scale)	Container-M	50	47.3
3	SMCA(single scale)	R50	108	42.7
4	SMCA(single scale)	R50	250	43.5
5	SMCA(multi scale)	R50	50	43.7
6	SMCA(New multi scale)	R50	50	44.4

SMCA has been accepted by ICCV 2021.

Original SMCA code submission during ICCV review period.

https://github.com/abc403/SMCA-replication

Release Steps

Single-scale SMCA
Single-scale SMCA with Container-Small
Single-scale SMCA with Container-Medium
New Multi-scale SMCA

Multi-scale Version

If you need multi-scale SMCA-DETR, please email me.

Internship and Research Engineer Chance:

I am going to join Shanghai AI Lab. My research focuses on General Vision and Large-scale visual-language pretraining. We offer good research platforms and guidance for our interns and research engineers. If you are interested in an internship or full-time research engineer chances at Shanghai AI Lab, please drop me an email at 1155102382@link.cuhk.edu.hk.

Citation

If you find this repository useful, please consider citing our work:

@article{gao2021fast,
  title={Fast convergence of detr with spatially modulated co-attention},
  author={Gao, Peng and Zheng, Minghang and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
  journal={arXiv preprint arXiv:2101.07448},
  year={2021}
}

@article{gao2021container,
  title={Container: Context Aggregation Network},
  author={Gao, Peng and Lu, Jiasen and Li, Hongsheng and Mottaghi, Roozbeh and Kembhavi, Aniruddha},
  journal={arXiv preprint arXiv:2106.01401},
  year={2021}
}

Contributor

Peng Gao, Qiu Han

Acknowledege

The project are borrowed heavily from DETR. Partially motivated by Sparse RCNN.

zhou745/SMCA-DETR