Aligning Bag of Regions for Open-Vocabulary Object Detection

Introduction

This is an official release of the paper Aligning Bag of Regions for Open-Vocabulary Object Detection.

Aligning Bag of Regions for Open-Vocabulary Object Detection,
Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy
In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper][Supp][project page(TBD)][Bibetex]

Installation

This project is based on MMDetection 3.x

It requires the following OpenMMLab packages:

MMEngine >= 0.6.0
MMCV-full >= v2.0.0rc4
MMDetection >= v3.0.0rc6
lvisapi

pip install openmim mmengine
mim install "mmcv>=2.0.0rc4"
pip install git+https://github.com/lvis-dataset/lvis-api.git
mim install mmdet>=3.0.0rc6

License

This project is released under the NTU S-Lab License 1.0.

Usage

Obtain CLIP Checkpoints

We use CLIP's ViT-B-32 model for the implementation of our method. Obtain the state_dict of the model from GoogleDrive and put it under checkpoints. Otherwise, pip install git+https://github.com/openai/CLIP.git and run

import clip
import torch
model, _ = clip.load("ViT-B/32")
torch.save(model.state_dict(), 'checkpoints/clip_vitb32.pth')

Training and Testing

The training and testing on OV-COCO are supported now.

Citation

@inproceedings{wu2023baron,
    title={Aligning Bag of Regions for Open-Vocabulary Object Detection},
    author={Size Wu and Wenwei Zhang and Sheng Jin and Wentao Liu and Chen Change Loy},
    year={2023},
    booktitle={CVPR},
}

wusize/ovdet