/ovad-benchmark-code

Playing around with the OVAD: Open-vocabulary Attribute Detection code

Primary LanguagePythonCreative Commons Zero v1.0 UniversalCC0-1.0

OVAD: Open-vocabulary Attribute Detection

News

2023-06 (v0.1): This repository is the official PyTorch implementation of our paper. 2023-02 OVAD has been accepted by CVPR 2023! paper 2022-11 Dataset visualizer page release page. 2022-11 Project page release page. 2022-11 Arxiv paper release

Table of Contents

Installation

Requirements

  • Linux or macOS with Python ≥ 3.6
  • PyTorch ≥ 1.8. Install them together at pytorch.org to make sure of this. Note, please check the PyTorch version matches the one required by Detectron2 and your CUDA version.
  • Detectron2: follow Detectron2 installation instructions.

Originally the code was tested on python=3.8.13, torch=1.10.0, cuda=11.2 and OS Ubuntu 20.04.

Example conda environment setup

conda create --name ovad_env python=3.8 -y
conda activate ovad_env

# install PyTorch and Detectron2 from previous section
# under your working directory
git clone https://github.com/OVAD-Benchmark/ovad-benchmark-code.git
cd ovad-benchmark-code
pip install -r requirements.txt

Prepare datasets

Download datasets

The basic functions of our code use COCO images and our annotations of the OVAD benchmark. Download the dataset from the official website use sim-link under $datasets/.

datasets/
    coco/
        train2017/
        val2017/
        annotations/
            captions_train2017.json
            instances_train2017.json 
            instances_val2017.json
    ovad/
        ovad2000.json
        ovad_evaluator.py
        ovad.py
        README.md

Filter annotations to separate base and novel classes

python tools/make_ovd_json.py --base_novel base --json_path datasets/coco/annotations/instances_train2017.json
python tools/make_ovd_json.py --base_novel base_novel17 --json_path datasets/coco/annotations/instances_val2017.json

This generates two new annotation json files $datasets/coco/annotations/instances_train2017_base.json training file with only base class (48) annotations $datasets/coco/annotations/instances_val2017_base_novel17.json validation file with base (48) and novel17 (17) class annotations

Extract the OVAD-box instances from the annotations for the box oracle evaluation

python tools/extract_obj_boxes.py

This should generate a new folder under $datasets/.

datasets/
    ovad_box_instances/
        2000_img/
            bb_images/
            bb_labels/
                ovad_4fold_ids.pkl  
                ovad_labels.pkl

Precompute the text features

Compute and save the noun and attribute text features for the OVAD benchmark. Or download them from link and place them under $datasets/text_representations.

python tools/dump_attribute_features.py --out_dir datasets/text_representations   \
      --save_obj_categories --save_att_categories --fix_space \
      --prompt none \
      --avg_synonyms --not_use_object --prompt_att none 

This should generate a new folder under $datasets/.

datasets/
    text_representations/
        ovad_att_clip-ViT-B32_+catt.npy
        ovad_att_clip-ViT-B32_+catt.pkl
        ovad_obj_cls_clipViT-B32_none+cname.npy
        ovad_obj_cls_clipViT-B32_none+cname.pkl

Open Vocabulary Detection Benchmark

Box Setup Vision Language Models

Evaluate Vision Language Models using the Box object setup Clone the respective repositories under $ovamc/, CLIP, Open Clip, ALBEF, BLIP, BLIP2, X-VLM Run the evaluation commands. Flags selected correspond to the best model and prompt configuration for every case of model.

# CLIP
python ovamc/ova_clip.py --model_arch "ViT-B/16" -bs 50 --prompt all
# Open CLIP
python ovamc/ova_open_clip.py --model_arch "ViT-B-32" -bs 50  --pretrained laion2b_e16 --prompt a
# ALBEF
python ovamc/ova_albef.py --model_arch "cocoGrounding" -bs 50 --prompt the --average_syn
# BLIP
python ovamc/ova_blip.py --model_arch "baseRetrieval" -bs 50 --prompt the
# BLIP 2
python ovamc/ova_blip2.py --model_arch "base_coco" -bs 50 --prompt the
# X-VLM
python ovamc/ova_xvlm.py --model_arch "pretrained16M2" -bs 50 --prompt all --average_syn

Results

Model ALL mAP HEAD MEDIUM TAIL
CLIP 16.55 43.87 18.56 4.39
Open CLIP 16.97 44.30 18.45 5.46
ALBEF 21.04 44.18 23.83 9.39
BLIP 24.29 50.99 28.46 9.69
BLIP 2 25.5 49.75 30.45 10.84
X-VLM 27.99 49.7 34.1 12.8

Detection Setup OVD Models

Model Outline

Method

Train

TODO: To be added soon

Evaluate

Download the pretrained OVAD Baseline model and place it under $models/

    models/
        OVAD_baseline.pth
# OVAD Baseline
python train_att_net.py --num-gpus 1 --resume \
--eval-only --config-file configs/Base_OVAD_C4_1x.yaml \
MODEL.WEIGHTS models/OVAD_baseline.pth 

# OVAD Baseline Box Oracle
python train_att_net.py --num-gpus 1 --resume \
--eval-only --config-file configs/BoxAnn_OVAD_C4_1x.yaml \
MODEL.WEIGHTS models/OVAD_baseline.pth

Benchmark results

Attribute results mAP for the given model weights

Model / Att Type Chance OVAD Baseline
all 8.610 20.134
head 36.005 46.577
medium 7.329 24.580
tail 0.614 5.298

Object results AP50 Generalized (OVD-80) - 2,000 images

Model Novel (32) Base (48) All (80)
OVAD Baseline 24.557 50.396 40.061

Object results AP50 Generalized (OVD) - 4,836 images

Model Novel (17) Base (48) All (65)
OVAD Baseline 29.757 49.355 44.229

Acknowledgements

For the full references see our paper. We especially thank the creators of the following github repositories for providing helpful code:

  • Zhou et al. for their open vocabulary model and code. Our code is in great portion based on this repository: Detic
  • Zareian et al. for their open-vocabulary setup and code: OVR-CNN
  • Radford et al. for their Clip model and code.

License

Creative Commons License

Citation

If you're using OVAD dataset or code in your research or applications, please cite using this BibTeX:

@InProceedings{Bravo_2023_CVPR,
    author    = {Bravo, Mar{\'\i}a A. and Mittal, Sudhanshu and Ging, Simon and Brox, Thomas},
    title     = {Open-Vocabulary Attribute Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {7041-7050}
}