Context-Aware Relative Object Queries to Unify Video Instance and Panoptic Segmentation (CVPR 2023)

by Anwesa Choudhuri, Girish Chowdhary, and Alexander G. Schwing

We develop a simple approach for multiple video segmentation tasks: video instance segmentation, multi-object tracking and segmentation and video panoptic segmentation, using the propagation of context-aware relative object queries (CAROQ).

Installation

See INSTALL.md.

Getting Started

Dataset preparation

See datasets/README.md.

Download Models

Please create a directory called ./models under the home directory and place all initial models for training and trained models for evaluation. For training, we start with Mask2Former models. Specific models used for initialization are mentioned in the config files for each dataset. Model paths can also be specified in the config files.

Trained models are coming soon!

Training/Evaluation

See train_eval_script.sh.

Citation

If you find the code or paper useful, please cite the following BibTeX entry.

@InProceedings{Choudhuri_2023_CVPR,
    author    = {Choudhuri, Anwesa and Chowdhary, Girish and Schwing, Alexander G.},
    title     = {Context-Aware Relative Object Queries To Unify Video Instance and Panoptic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {6377-6386}
}

Acknowledgement

Code is largely based on Mask2Former. We also borrow parts from Transformer-XL.

This work is supported in party by Agriculture and Food Research Initiative (AFRI) grant no. 2020-67021-32799/project accession no.1024178 from the USDA National Institute of Food and Agriculture: NSF/USDA National AI Institute: AIFARMS. We also thank the Illinois Center for Digital Agriculture for seed funding for this project. Work is also supported in part by NSF under Grants 2008387, 2045586, 2106825, MRI 1725729.

AnwesaChoudhuri/CAROQ