/DAC-DETR

[NIPS2023] This is an official implementation of paper "DAC-DETR: Divide the Attention Layers and Conquer".

Primary LanguagePythonMIT LicenseMIT

DAC-DETR

This is the official implementation (PyTorch and PaddlePaddle) of the paper "DAC-DETR: Divide the Attention Layers and Conquer".

Authors: Zhengdong Hu, Yifan Sun, Jingdong Wang, Yi Yang

πŸ“§ πŸ“§ πŸ“§ Contact: huzhengdongcs@gmail.com

News

[Sep. 22 2023] DAC-DETR: Divide the Attention Layers and Conquer, has been accepted at NeurIPS 2023 as a poster.

Methods:

This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts its training efficacy, i.e., the cross-attention and self-attention layers in DETR decoder have contrary impacts on the object queries (though both impacts are important). Specifically, we observe the cross-attention tends to gather multiple queries around the same object, while the self-attention disperses these queries far away. To improve the training efficacy, we propose a Divide-And-Conquer DETR (DAC-DETR) that divides the cross-attention out from this contrary for better conquering. During training, DAC-DETR employs an auxiliary decoder that focuses on learning the cross-attention layers. The auxiliary decoder, while sharing all the other parameters, has NO self-attention layers and employs one-to-many label assignment to improve the gathering effect. Experiments show that DAC-DETR brings remarkable improvement over popular DETRs. For example, under the 12 epochs training scheme on MS-COCO, DAC-DETR improves Deformable DETR (ResNet-50) by +3.4 AP and achieves 50.9 (ResNet-50) / 58.1 AP (Swin-Large) based on some popular methods (i.e., DINO and an IoU-related loss).

Analysis

We count the averaged number of queries that have large affinity with each object. Compared with the baseline, DAC-DETR 1) has more queries for each object, and 2) improves the quality of the closest queries.y-axis denotes β€œavg number of queries / object".

Installation

We use python=3.7.10, pytorch=1.8.0, cuda=11.1.

Clone the repo

git https://github.com/huzhengdongcs/DAC-DETR.git
cd DAC-DETR

Prepare environments

sh env_run.sh

Data

mkdir ./data/

data/
  └── coco/
     β”œβ”€β”€ train2017/
     β”œβ”€β”€ val2017/
     └── annotations/

Pretrain backbones

mkdir ./initmodel

You can download Resnet50 and Swin_transformer and put them into ./initmodel

Run

Please note that our implementations are based on 8 A100 or 8 V100 GPUS.

For example, you can run dac_cdn_ice with 12 epochs, Res50 by

sh train.sh 

Eval

The trained models are saved in output.

For example, you can test dac_cdn_ice with 12 epochs, Res50 by

sh test.sh

Models

Name Backbone epochs AP Model log
dac_cdn Res50 12 50.0 Google, Baidu Google, Baidu
dac_cdn Res50 24 51.2 Google, Baidu Google, Baidu
dac_cdn_ice Res50 12 50.9 Google, Baidu Google, Baidu
dac_cdn_ice Res50 24 52.1 Google, Baidu Google, Baidu
dac_cdn Swin_Large 12 57.3 Google, Baidu Google, Baidu
dac_cdn_ice Swin_Large 12 58.1 Google, Baidu Google, Baidu
dac_cdn_ice Swin_Large 24 59.3 Google, Baidu Google, Baidu

Notes

You can access the pytorch code of 'dac-detr + contrastive denoising (cdn)' and model from

  1. Baidu Netdisk.
  2. Google Drive

Citing DAC-DETR

If you find DAC-DETR useful to your research, please consider citing:

@inproceedings{
hu2023dacdetr,
title={{DAC}-{DETR}: Divide the Attention Layers and Conquer},
author={Zhengdong Hu and Yifan Sun and Jingdong Wang and Yi Yang},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=8JMexYVcXB}
}