/CAT

Primary LanguagePython

CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection (CVPR2023)

Abstract

Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. For existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM), we observe the hindering problems. (𝑖) The inclusion of detecting unknown objects substantially reduces the model’s ability to detect known ones. (𝑖𝑖) The PLM does not adequately utilize the priori knowledge of inputs. (𝑖𝑖𝑖) The fixed manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via two cascade transformer decoders. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, 𝑖.𝑒., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.

figure2

Installation

Requirements

We have trained and tested our models on Ubuntu 16.0, CUDA 10.2, GCC 5.4, Python 3.7

conda create -n cat python=3.7 pip
conda activate cat
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Backbone features

Download the self-supervised backbone from here and add in models folder.

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Dataset & Results

OWOD proposed splits



Results

Task1 Task2 Task3 Task4
Method U-Recall mAP U-Recall mAP U-Recall mAP mAP
ORE-EBUI 4.9 56.0 2.9 39.4 3.9 29.7 25.3
OW-DETR 7.5 59.2 6.2 42.9 5.7 30.8 27.8
CAT 23.7 60.0 19.1 44.1 24.4 34.8 30.4

Weights

T1 weight | T2_ft weight | T3_ft weight | T4_ft weight

OWDETR proposed splits



Results

Task1 Task2 Task3 Task4
Method U-Recall mAP U-Recall mAP U-Recall mAP mAP
ORE-EBUI 1.5 61.4 3.9 40.6 3.6 33.7 31.8
OW-DETR 5.7 71.5 6.2 43.8 6.9 38.5 33.1
CAT 24.0 74.2 23.0 50.7 24.6 45.0 42.8

Weights

T1 weight | T2_ft weight | T3_ft weight | T4_ft weight

Dataset Preparation

The splits are present inside data/VOC2007/CAT/ImageSets/ folder.

  1. Make empty JPEGImages and Annotations directory.
mkdir data/VOC2007/CAT/JPEGImages/
mkdir data/VOC2007/CAT/Annotations_selective/
  1. Download the COCO Images and Annotations from coco dataset.
  2. Unzip train2017 and val2017 folder. The current directory structure should look like:
CAT/
└── data/
    └── coco/
        β”œβ”€β”€ annotations/
        β”œβ”€β”€ train2017/
        └── val2017/
  1. Move all images from train2017/ and val2017/ to JPEGImages folder.
cd CAT/data
mv data/coco/train2017/*.jpg data/VOC2007/CAT/JPEGImages/.
mv data/coco/val2017/*.jpg data/VOC2007/CAT/JPEGImages/.
  1. Annotations_selective :The Annotations can be made by the file "make_pseudo_labels.py"

The files should be organized in the following structure:

CAT/
└── data/
    └── VOC2007/
        └── OWOD/
        	β”œβ”€β”€ JPEGImages
        	β”œβ”€β”€ ImageSets
        	└── Annotations_selective

Currently, Dataloader and Evaluator followed for CAT is in VOC format.

Training

Training on single node

To train CAT on a single node with 8 GPUS, run

./run.sh

Training on slurm cluster

To train CAT on a slurm cluster having 2 nodes with 8 GPUS each, run

sbatch run_slurm.sh

Evaluation

For reproducing any of the above mentioned results please run the run_eval.sh file and add pretrained weights accordingly.

Note: For more training and evaluation details please check the Deformable DETR reposistory.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you use CAT, please consider citing:

@article{ma2023cat,
  title={CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection},
  author={Ma, Shuailei and Wang, Yuefeng and Fan, Jiaqi and Wei, Ying and Li, Thomas H and Liu, Hongli and Lv, Fanbing},
  journal={arXiv preprint arXiv:2301.01970},
  year={2023}
}

Contact

Should you have any question, please contact πŸ“§ xiaomabufei@gmail.com

Acknowledgments:

CAT builds on previous works code base such as OWDETR,Deformable DETR, Detreg, and OWOD. If you found CAT useful please consider citing these works as well.