/ASAG

This is the official PyTorch implementation of ASAG (ICCV 2023).

Primary LanguagePythonApache License 2.0Apache-2.0

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

This is the official PyTorch implementation of ASAG (ICCV 2023).

DETR

1 Introduction

  • Recent sparse detectors with multiple, e.g. six, decoder layers achieve promising performance but much inference time due to complex heads. Previous works have explored using dense priors as initialization and built one-decoder-layer detectors. Although they gain remarkable acceleration, their performance still lags behind their six-decoder-layer counterparts by a large margin. In this work, we aim to bridge this performance gap while retaining fast speed. We find that the architecture discrepancy between dense and sparse detectors leads to feature conflict, hampering the performance of one-decoder-layer detectors. Thus we propose Adaptive Sparse Anchor Generator (ASAG) which predicts dynamic anchors on patches rather than grids in a sparse way so that it alleviates the feature conflict problem. For each image, ASAG dynamically selects which feature maps and which locations to predict, forming a fully adaptive way to generate image-specific anchors. Further, a simple and effective Query Weighting method eases the training instability from adaptiveness. Extensive experiments show that our method outperforms dense-initialized ones and achieves a better speed-accuracy trade-off.
  • Our ASAG starts predicting dynamic anchors from fixed feature maps and then adaptively explores large feature maps using Adaptive Probing, which runs top-down and coarse-to-fine. We can even discard large feature maps manually for efficient inference.

2 Model Zoo

name backbone epoch #queries box AP Where in Our Paper
1 ASAG-A R50 12 107 42.6 Table 2
2 ASAG-A R50 12 329 43.6 Table 2
3 ASAG-A R50 36 102 45.3 Table 4
4 ASAG-A R50 36 312 46.3 Table 4
5 ASAG-A R101 36 296 47.5 Table 4
6 ASAG-S R50 36 100 43.9 Table 3 & 4
7 ASAG-S R50 36 312 45.0 Table 3 & 4
8 ASAG-A-dn R50 12 106 43.1 Table A-1
9 ASAG-A-crosscl R50 12 103 43.8
  • Notes:

    • All the checkpoints and logs are be found in Google Drive / Baidu (pwd: asag)
    • Results in the above table are tested on COCO dataset.
    • In ASAG, we use 4 parallel decoders, most of which perform similarly (~0.2AP).
    • To test speed, users need to slightly modify the code, including:
      • use only one decoder: --num_decoder_layers 1
      • use fast_inference api rather than forward in models/anchor_generator.py

3 Data preparation

Download and extract COCO 2017 train and val images with annotations from here.

We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

4 Usage

  • To prevent users from confusing different ImageNet pretrained checkpoints, we require users to download the corresponding version of the checkpoint from TorchVision manually. (i.e. R50v1 and R101v1)
  • Our environment
    • NVIDIA RTX 3090
    • python: 3.7.12
    • Torch: 1.10.2 + cu113
    • Torchvision: 0.11.3 + cu113
ASAG-A (1x, R50, 100 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100.pth --used_head aux_2
ASAG-A (1x, R50, 300 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_300.pth --used_head aux_2 --num_query 300
ASAG-A (3x, R50, 100 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --training_schedule 3x

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_3x_100.pth --used_head main
ASAG-A (3x, R50, 300 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_3x_300.pth --used_head aux_2 --num_query 300
ASAG-A (3x, R101, 300 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet101 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet101 --eval --resume ASAG_A_r101_3x_300.pth --used_head aux_2 --num_query 300
ASAG-S (3x, R50, 100 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --training_schedule 3x --decoder_type SparseRCNN

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --decoder_type SparseRCNN --resume ASAG_S_r50_3x_100.pth --used_head aux_2
ASAG-S (3x, R50, 300 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x --decoder_type SparseRCNN

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_S_r50_3x_300.pth --used_head aux_2 --num_query 300 --decoder_type SparseRCNN
ASAG-A+dn (1x, R50, 100 queries)

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --use_dn --fix_noise_scale

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100_dn.pth --used_head aux_2

5 Efficient inference

  • Taking ASAG-A (1x, R50, 100 queries) as an example.

  • --used_inference_level can choose from ['P3P6', 'P4P6', 'P5P6'].

    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100.pth --used_head aux_2 --used_inference_level P5P6
    

6 CrowdHuman Results

name AP(↑) mMR(↓) R(↑) Where in Our Paper
1 Deformable DETR 86.7 54.0 92.5 Table 6
2 Sparse RCNN 89.2 48.3 95.9 Table 6
3 ASAG-S 91.3 43.5 96.9 Table 6
  • We also run ASAG-S on CrowdHuman dataset with R50, 50 epochs and the average number of anchors within 500.

  • Data preparation. After downloading the dataset, users should first convert the annotations to the coco format by running crowdhumantools/convert_crowdhuman_to_coco.py. Before running it, please make sure the file paths in it are correct.

    path/to/crowdhuman/
      annotations/  				# annotation json files
      CrowdHuman_train/    	# train images
      CrowdHuman_val/      	# val images
    
  • Training

    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --dataset_file crowdhuman --coco_path YOUR_CROWDHUMAN_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --decoder_type SparseRCNN
    
  • Inference

    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --dataset_file crowdhuman --coco_path YOUR_CROWDHUMAN_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_S_crowdhuman.pth --used_head aux_0 --decoder_type SparseRCNN
    

7 Equipping with stronger backbone

backbone AP APs APm APl
1 torchvision R50 42.6 25.9 45.8 56.9
2 CrossCL R50 43.8 26.1 47.4 59.3
  • We run ASAG-A with our self-supervised pretrained backbone CrossCL under 1x schedule, which can boost ASAG by 1.2 AP.

  • The pretrained backbone can be found in Google Drive / Baidu (pwd: asag).

  • Training

    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint crosscl_resnet50.pth
    
  • Inference

    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100_crosscl.pth --used_head aux_2
    

8 License

ASAG is released under the Apache 2.0 license. Please see the LICENSE file for more information.

9 Bibtex

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{fu2023asag,
  title={ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation},
  author={Fu, Shenghao and Yan, Junkai and Gao, Yipeng and Xie, Xiaohua and Zheng, Wei-Shi},
  booktitle={ICCV},
  year={2023},
}

@inproceedings{yan2023cross,
  title={Self-supervised Cross-stage Regional Contrastive Learning for Object Detection},
  author={Yan, Junkai and Yang, Lingxiao and Gao, Yipeng and Zheng, Wei-Shi},
  booktitle={ICME},
  year={2023},
}

10 Acknowledgement

Our ASAG is heavily inspired by many outstanding prior works, including

Thank the authors of above projects for open-sourcing their implementation codes!