/stud

source code for CVPR'22 paper "Unknown-Aware Object Detection: Learning What You Don’t Know from Videos in the Wild"

Primary LanguagePythonApache License 2.0Apache-2.0

STUD

This is the source code accompanying the paper Unknown-Aware Object Detection:Learning What You Don’t Know from Videos in the Wild paper by Xuefeng Du, Xin Wang, Gabriel Gozum and Yixuan Li

The codebase is based heavily from CycleConf and Detectron2.

Ads

Checkout our similar ICLR'22 work VOS on object detection in still images and classification networks, NeurIPS'22 work SIREN on OOD detection for detection transformers if you are interested!

Installation

Environment

  • CUDA 10.2
  • Python >= 3.7
  • Pytorch >= 1.6
  • THe Detectron2 version matches Pytorch and CUDA versions.

Dependencies

  1. Create a virtual env.
  • python3 -m pip install --user virtualenv
  • python3 -m venv stud
  • source stud/bin/activate
  1. Install dependencies.
  • pip install -r requirements.txt

  • Install Pytorch 1.9

pip3 install torch torchvision

Check out the previous Pytorch versions here.

  • Install Detectron2 Build Detectron2 from Source (gcc & g++ >= 5.4) python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Or, you can install Pre-built detectron2 (example for CUDA 10.2, Pytorch 1.9)

python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html

More details can be found here.

Data Preparation

BDD100K

  1. Download the BDD100K MOT 2020 dataset (MOT 2020 Images and MOT 2020 Labels) and the detection labels (Detection 2020 Labels) here and the detailed description is available here. Put the BDD100K data under datasets/ in this repo. After downloading the data, the folder structure should be like below:
├── datasets
│   ├── bdd100k
│   │   ├── images
│   │   │    └── track
│   │   │        ├── train
│   │   │        ├── val
│   │   │        └── test
│   │   └── labels
│   │        ├── box_track_20
│   │        │   ├── train
│   │        │   └── val
│   │        └── det_20
│   │            ├── det_train.json
│   │            └── det_val.json
│   ├── waymo

Convert the labels of the MOT 2020 data (train & val sets) into COCO format by running:

python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/val/ -o datasets/bdd100k/labels/track/bdd100k_mot_val_coco.json -m track
python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/train/ -o datasets/bdd100k/labels/track/bdd100k_mot_train_coco.json -m track

COCO

Download COCO2017 dataset from the official website.

Download the OOD dataset (json file) when the in-distribution dataset is Youtube-VIS from here.

Download the OOD dataset (json file) when the in-distribution dataset is BDD100k from here.

Put the two processed OOD json files to ./anntoations

The COCO dataset folder should have the following structure:

 └── datasets
     └── coco2017
         ├── annotations
            ├── xxx (the original json files)
            ├── instances_val2017_ood_wrt_bdd.json
            └── instances_val2017_ood_wrt_vis.json
         ├── train2017
         └── val2017

Youtube-VIS

Download the dataset from the official website.

Preprocess the dataset to generate the training and validation splits by running:

python datasets/convert_vis_val.py

The Youtube-VIS dataset folder should have the following structure:

 └── datasets
    └── vis
      └── train
        └── JPEGImages
        ├── instances_train.json
        └── instances_val.json

nuImages

Download the dataset from the official website.

Convert the dataset by running:

python datasets/convert_nu.py
python datasets/convert_nu_ood.py

The nuImages dataset folder should have the following structure:

 └── datasets
    └── nuscence
      └── v1.0-mini
      ├── v1.0-test
      ├── v1.0-val
      ├── v1.0-train
      ├── samples
      ├── semantic_masks
      ├── calibrated
      ├── nuimages_v1.0-val.json
      └── nu_ood.json

Before training, modify the dataset address in the ./src/data/builtin.py according to your local dataset address.

Training

Vanilla with BDD100K as the in-distribution dataset

python -m tools.train_net --config-file ./configs/BDD100k/R50_FPN_all.yaml --num-gpus 4

Vanilla with Youtube-VIS as the in-distribution dataset

python -m tools.train_net --config-file ./configs/VIS/R50_FPN_all.yaml --num-gpus 4

STUD on ResNet (BDD as ID data)

python -m tools.train_net --config-file ./configs/BDD100k/stud_resnet.yaml --num-gpus 4

STUD on RegNet (BDD as ID data)

python -m tools.train_net --config-file ./configs/BDD100k/stud_regnet.yaml --num-gpus 4

Download the pretrained backbone for RegNetX from here.

Pretrained models

The pretrained models for BDD100K can be downloaded from vanilla and STUD-ResNet and STUD-RegNet.

The pretrained models for Youtube-VIS can be downloaded from vanilla and STUD-ResNet and STUD-RegNet.

Evaluation

Evalutation with the in-distribution dataset to be BDD100K

Firstly run on the in-distribution dataset:

python -m tools.train_net --config-file ./configs/BDD100k/stud_resnet.yaml --num-gpus 4 --eval-only MODEL.WEIGHTS address/model_final.pth

where "address" is specified in the corresponding yaml file.

Then run on the OOD dataset (COCO):

python -m tools.train_net --config-file ./configs/BDD100k/stud_resnet_ood_coco.yaml --num-gpus 4 --eval-only MODEL.WEIGHTS address/model_final.pth

Obtain the metrics using:

python bdd_coco.py --energy 1 --model xxx

Here "--model" means the name of the directory that contains the checkpoint file. Evaluation on nuImages is similar.

Citation

If you found any part of this code is useful in your research, please consider citing our paper:

  @article{du2022stud,
      title={Unknown-Aware Object Detection: Learning What You Don’t Know from Videos in the Wild}, 
      author={Du, Xuefeng and Wang, Xin and Gozum, Gabriel and Li, Yixuan},
      journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2022}
}