detectron2-OWT: A Python repository from YangLiu14

This is a forked version of Detectron2 with modifications for proposal generation as described in "Opening up Open-World Tracking"

To see the modifcations (NMS, scorings and appearance-embeddings extraction) in detail, search globally for "OWT".

Installation

git clone git@github.com:YangLiu14/detectron2-OWT.git
python -m pip install -e detectron2-OWT

***Attention: If you have previously installed detectron2 package, remove it before the new installation:

cd detectron2-OWT
pip uninstall detectron2
rm -rf build/ **/*.so
cd ..

Model Zoo and Baselines

We use the pretrained models from Detectron2 Model Zoo, and config them to be:

no NMS
no confidence threshold
category-agnostic

We provide two examples:

Name	box AP	mask AP	model id	download
Panoptic FPN R101	47.4	41.3	139797668	model
R101-FPN-400ep (new baseline)	48.9	43.7	42047764	model

Proposal Generation (OWT)

To generate the same set of proposals that we used in the paper "Opening Up Open-World Track", you can use the following command.

Generate proposals for each frame (valid set)

python owt_scripts/gen_proposals.py \
  --config-file ./configs/Misc/owt/panoptic_fpn_R_101_dconv_cascade_gn_3x.yaml \
  --input /data/TAO/frames/ \   # give your own path
  --outdir /proposals/val/npz/ \    # give your own path
  --split val \
  --opts MODEL.WEIGHTS /model_weights/Panoptic_FPN_R101/model_final_be35db.pkl

Generate proposals only for annotated frames (skipping frames)

python owt_scripts/gen_proposals.py \
  --config-file ./configs/Misc/owt/panoptic_fpn_R_101_dconv_cascade_gn_3x.yaml \
  --input /data/TAO/frames/ \   # give your own path
  --outdir /proposals/val/npz/ \    # give your own path
  --split val --annot-only \
  --opts MODEL.WEIGHTS /model_weights/Panoptic_FPN_R101/model_final_be35db.pkl

Other useful arguments

--video_src_names there are 7 datasets in TAO, sometimes you just wanna inference on some of them, then give the video-source-name(s) accordingly, e.g. --video_src_names ArgoVerse BDD YFCC100M
--vidx_start and --vdix_end for each video-source, e.g. ArgoVerse, it could contain hundreds of video sequences. Use multiple GPUs to generate them could help accelerate the process. These two arguments specifies the start- and the end-idx of the video sequences.

Citing Detectron2

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Citing OWT

Opening up Open-World Tracking
Yang Liu*, Idil Esen Zulfikar*, Jonathon Luiten*, Achal Dave*, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé
*Equal contribution
CVPR 2022