GOOD: Exploring geometric cues for detecting objects in an open world

This repository is the official implementation of GOOD: Exploring geometric cues for detecting objects in an open world (ICLR 2023).

What is it?

We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators.

As we can see from the following figure, geometric cues are much more generalizable across different categories, and can effectively narrow the generalization gap between base (known) and novel (unknown) categories. Our method has achieved SOTA results on many open-world detection benchmarks including COCO Person to non-Person, VOC to non-VOC, LVIS COCO to non-COCO, and COCO to UVO.

How we do it?

As shown in the following figure, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. The top-ranked pseudo boxes are added to the annotation pool for Phase II training, i.e., a class-agnostic object detector is directly trained on the RGB input using both the base class and pseudo annotations. At inference time, we only need the model from Phase II.

Pre-trained Weights

You can download pretrained weights here:

Training	Eval	url	OLN AR_N@100	GOOD AR_N@100
Person, COCO	Non-Person, COCO	Pseudo-box/GOOD	16.5	26.2
VOC, COCO	Non-VOC, COCO	Pseudo-box/GOOD	33.2	39.3
COCO	Non-COCO, LVIS	Pseudo-box/GOOD	27.4	29.0

For all GOOD models, we find the optimal number k for pseudo labels is 1. Due to some modifications of the evaluation code, the numbers are slightly different from the papers.

Installation

This repository is based on mmdetection and OLN.

You can use following commands to create conda env with related dependencies.

conda create -n good python=3.8 -y
conda activate good
conda install pytorch=1.7.0 torchvision -c pytorch
conda install cuda -c nvidia
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
pip install -v -e .

Extracting geometric cues

Please refer to Omnidata repositories for the pretrained models. We provide an example code for extracting depth and normal here. Please put it in the same repository as Omnidata repository to use it.

Phase-I training and generating pseudo-labels

To train the Phase-I model, run this command:

python tools/train_good.py configs/good/phase1_depth.py

After training, you can run this command to extract pseudo labels and generate a COCO-format annotation file:

python tools/test_extract_proposals.py configs/good/phase1_depth.py path-to-checkpoint/latest.pth --eval bbox --modality depth --out path-to-save-pseudo-box-json

Phase-II training

To train the Phase-II model, run this command:

python tools/train_good.py configs/good/phase2_good.py

Note the difference of config files from Phase-I. You need to specify the filenames of the pseudo boxes in the config file.

Evaluation

To evaluate the model, run:

python tools/test_good.py configs/good/phase2_good.py path-to-checkpoint/latest.pth --eval bbox

To cite this work:

@inproceedings{
    huang2023good,
    title={{GOOD}: Exploring geometric cues for detecting objects in an open world},
    author={Haiwen Huang and Andreas Geiger and Dan Zhang},
    booktitle={The Eleventh International Conference on Learning Representations },
    year={2023},
    url={https://openreview.net/forum?id=W-nZDQyuy8D}
}

License

This code repository is open-sourced under MIT license.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

autonomousvision/good