DAC-DETR

This is the official implementation (PyTorch and PaddlePaddle) of the paper "DAC-DETR: Divide the Attention Layers and Conquer".

Authors: Zhengdong Hu, Yifan Sun, Jingdong Wang, Yi Yang

📧 📧 📧 Contact: huzhengdongcs@gmail.com

News

[Sep. 22 2023] DAC-DETR: Divide the Attention Layers and Conquer, has been accepted at NeurIPS 2023 as a poster.

Methods：

This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts its training efficacy, i.e., the cross-attention and self-attention layers in DETR decoder have contrary impacts on the object queries (though both impacts are important). Specifically, we observe the cross-attention tends to gather multiple queries around the same object, while the self-attention disperses these queries far away. To improve the training efficacy, we propose a Divide-And-Conquer DETR (DAC-DETR) that divides the cross-attention out from this contrary for better conquering. During training, DAC-DETR employs an auxiliary decoder that focuses on learning the cross-attention layers. The auxiliary decoder, while sharing all the other parameters, has NO self-attention layers and employs one-to-many label assignment to improve the gathering effect. Experiments show that DAC-DETR brings remarkable improvement over popular DETRs. For example, under the 12 epochs training scheme on MS-COCO, DAC-DETR improves Deformable DETR (ResNet-50) by +3.4 AP and achieves 50.9 (ResNet-50) / 58.1 AP (Swin-Large) based on some popular methods (i.e., DINO and an IoU-related loss).

Analysis

We count the averaged number of queries that have large affinity with each object. Compared with the baseline, DAC-DETR 1) has more queries for each object, and 2) improves the quality of the closest queries.y-axis denotes “avg number of queries / object".

Installation

We use python=3.7.10, pytorch=1.8.0, cuda=11.1.

Clone the repo

git https://github.com/huzhengdongcs/DAC-DETR.git
cd DAC-DETR

Prepare environments

sh env_run.sh

Data

mkdir ./data/

data/
  └── coco/
     ├── train2017/
     ├── val2017/
     └── annotations/

Pretrain backbones

mkdir ./initmodel

You can download Resnet50 and Swin_transformer and put them into ./initmodel

Run

Please note that our implementations are based on 8 A100 or 8 V100 GPUS.

For example, you can run dac_cdn_ice with 12 epochs, Res50 by

sh train.sh

Eval

The trained models are saved in output.

For example, you can test dac_cdn_ice with 12 epochs, Res50 by

sh test.sh

Models

Name	Backbone	epochs	AP	Model	log
dac_cdn	Res50	12	50.0	Google, Baidu	Google, Baidu
dac_cdn	Res50	24	51.2	Google, Baidu	Google, Baidu
dac_cdn_ice	Res50	12	50.9	Google, Baidu	Google, Baidu
dac_cdn_ice	Res50	24	52.1	Google, Baidu	Google, Baidu
dac_cdn	Swin_Large	12	57.3	Google, Baidu	Google, Baidu
dac_cdn_ice	Swin_Large	12	58.1	Google, Baidu	Google, Baidu
dac_cdn_ice	Swin_Large	24	59.3	Google, Baidu	Google, Baidu

Notes

You can access the pytorch code of 'dac-detr + contrastive denoising (cdn)' and model from

Citing DAC-DETR

If you find DAC-DETR useful to your research, please consider citing:

@inproceedings{
hu2023dacdetr,
title={{DAC}-{DETR}: Divide the Attention Layers and Conquer},
author={Zhengdong Hu and Yifan Sun and Jingdong Wang and Yi Yang},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=8JMexYVcXB}
}