CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection (CVPR2023)
Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. For existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM), we observe the hindering problems. (π) The inclusion of detecting unknown objects substantially reduces the modelβs ability to detect known ones. (ππ) The PLM does not adequately utilize the priori knowledge of inputs. (πππ) The fixed manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via two cascade transformer decoders. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, π.π., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.
We have trained and tested our models on Ubuntu 16.0
, CUDA 10.2
, GCC 5.4
, Python 3.7
conda create -n cat python=3.7 pip
conda activate cat
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
Download the self-supervised backbone from here and add in models
folder.
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
Task1 | Task2 | Task3 | Task4 | ||||
---|---|---|---|---|---|---|---|
Method | U-Recall | mAP | U-Recall | mAP | U-Recall | mAP | mAP |
ORE-EBUI | 4.9 | 56.0 | 2.9 | 39.4 | 3.9 | 29.7 | 25.3 |
OW-DETR | 7.5 | 59.2 | 6.2 | 42.9 | 5.7 | 30.8 | 27.8 |
CAT | 23.7 | 60.0 | 19.1 | 44.1 | 24.4 | 34.8 | 30.4 |
T1 weight | T2_ft weight | T3_ft weight | T4_ft weight
Task1 | Task2 | Task3 | Task4 | ||||
---|---|---|---|---|---|---|---|
Method | U-Recall | mAP | U-Recall | mAP | U-Recall | mAP | mAP |
ORE-EBUI | 1.5 | 61.4 | 3.9 | 40.6 | 3.6 | 33.7 | 31.8 |
OW-DETR | 5.7 | 71.5 | 6.2 | 43.8 | 6.9 | 38.5 | 33.1 |
CAT | 24.0 | 74.2 | 23.0 | 50.7 | 24.6 | 45.0 | 42.8 |
T1 weight | T2_ft weight | T3_ft weight | T4_ft weight
The splits are present inside data/VOC2007/CAT/ImageSets/
folder.
- Make empty
JPEGImages
andAnnotations
directory.
mkdir data/VOC2007/CAT/JPEGImages/
mkdir data/VOC2007/CAT/Annotations_selective/
- Download the COCO Images and Annotations from coco dataset.
- Unzip train2017 and val2017 folder. The current directory structure should look like:
CAT/
βββ data/
βββ coco/
βββ annotations/
βββ train2017/
βββ val2017/
- Move all images from
train2017/
andval2017/
toJPEGImages
folder.
cd CAT/data
mv data/coco/train2017/*.jpg data/VOC2007/CAT/JPEGImages/.
mv data/coco/val2017/*.jpg data/VOC2007/CAT/JPEGImages/.
- Annotations_selective :The Annotations can be made by the file "make_pseudo_labels.py"
The files should be organized in the following structure:
CAT/
βββ data/
βββ VOC2007/
βββ OWOD/
βββ JPEGImages
βββ ImageSets
βββ Annotations_selective
Currently, Dataloader and Evaluator followed for CAT is in VOC format.
To train CAT on a single node with 8 GPUS, run
./run.sh
To train CAT on a slurm cluster having 2 nodes with 8 GPUS each, run
sbatch run_slurm.sh
For reproducing any of the above mentioned results please run the run_eval.sh
file and add pretrained weights accordingly.
Note: For more training and evaluation details please check the Deformable DETR reposistory.
This repository is released under the Apache 2.0 license as found in the LICENSE file.
If you use CAT, please consider citing:
@article{ma2023cat,
title={CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection},
author={Ma, Shuailei and Wang, Yuefeng and Fan, Jiaqi and Wei, Ying and Li, Thomas H and Liu, Hongli and Lv, Fanbing},
journal={arXiv preprint arXiv:2301.01970},
year={2023}
}
Should you have any question, please contact π§ xiaomabufei@gmail.com
Acknowledgments:
CAT builds on previous works code base such as OWDETR,Deformable DETR, Detreg, and OWOD. If you found CAT useful please consider citing these works as well.