LocOV: Localized Vision-Language Matching for Open-vocabulary Object Detection

News

2022-07 (v0.1): This repository is the official PyTorch implementation of our GCPR 2022 paper: Localized Vision-Language Matching for Open-vocabulary Object Detection

News
Table of Contents
Installation
Prepare datasets
- Download datasets
- Precompute the text features
Train and validate Open Vocabulary Detection
- Model Outline
- Useful script commands
Acknowledgements
License
Citation

Installation

Requirements

Linux or macOS with Python ≥ 3.6
PyTorch ≥ 1.8. Install them together at pytorch.org to make sure of this. Note, please check the PyTorch version matches the one required by Detectron2 and your CUDA version.
Detectron2: follow Detectron2 installation instructions.

Originally the code was tested on python=3.8.13, torch=1.10.0, cuda=11.2 and OS Ubuntu 20.04.

git clone https://github.com/lmb-freiburg/locov.git
cd locov

Prepare datasets

Download datasets

Download MS COCO training and validation datasets. Download detection and caption annotations for retrieval from the original page.
Save the data in datasets_data
Run the script to create the annotation subsets that include only base and novel categories

python tools/convert_annotations_to_ov_sets.py

Precompute the text features

Run the script to save and calculate the object embeddings.

python tools/coco_bert_embeddings.py

Or download the precomputed ones Embeddings

Precomputed generic object proposals

Train OLN on MSCOCO known classes and extract the proposals for all the training set.
Or download the precomputed proposals for MSCOCO Train on known classes only Proposals (3.9GB)

Train and validate Open Vocabulary Detection

Model Outline

Useful script commands

Train LSM stage

Run the script to train the Localized Semantic Matching stage

python train_ovnet.py --num-gpus 8 --resume --config-file configs/coco_lsm.yaml

Train STT stage

Run the script to train the Localized Semantic Matching stage

python train_ovnet.py --num-gpus 8 --resume --config-file configs/coco_stt.yaml MODEL.WEIGHTS path_to_final_weights_lsm_stage

Evaluate

python train_ovnet.py --num-gpus 8 --resume --eval-only --config-file configs/coco_stt.yaml \
MODEL.WEIGHTS output/model-weights.pth \
OUTPUT_DIR output/eval_locov

Benchmark results

Models zoo

Pretrained models can be found in the models directory

Model	AP-novel	AP50-novel	AP-known	AP50-known	AP-general	AP50-general	Weights
LocOv	17.219	30.109	33.499	53.383	28.129	45.719	LocOv

Acknowledgements

This work was supported by Deutscher Akademischer Austauschdienst - German Academic Exchange Service (DAAD) Research Grants - Doctoral Programmes in Germany, 2019/20; grant number: 57440921.

The Deep Learning Cluster used in this work is partially funded by the German Research Foundation (DFG) - 417962828.

We especially thank the creators of the following github repositories for providing helpful code:

Zareian et al. for their open-vocabulary setup and code: OVR-CNN

License

This work is licensed under a Creative Commons Attribution 3.0 Unported License To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Citation

If you use our repository or find it useful in your research, please cite the following paper:

@InProceedings{Bravo2022locov,
  author       = "M. Bravo and S. Mittal and T. Brox",
  title        = "Localized Vision-Language Matching for Open-vocabulary Object Detection",
  booktitle    = "German Conference on Pattern Recognition (GCPR) 2022",
  year         = "2022"
}

lmb-freiburg/locov