/GRAttVIS

Primary LanguagePythonApache License 2.0Apache-2.0

GRAtt-VIS

DOI PWC

This is an official Pytorch implementation of GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation. In this repository, we provide PyTorch code for training and testing our proposed GRAtt-VIS model. GRAtt-VIS is an efficient video instance segmentation and tracking model that achieves state-of-the-art results on several benchmarks, such as YTVIS-19/21/22 and OVIS.

Updates

  • Jun 14, 2023: Code is now available!

Installation

GRAtt-VIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_grattvis.py, that is made to train all the configs provided in GRAtt-VIS. To train a model with "train_net_grattvisvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets. Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python3 train_net_genvis.py --num-gpus 4 \
--config-file configs/genvis/ovis/grattvis_R50_bs8.yaml \
MODEL.WEIGHTS weights/vita_r50_ovis.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir

To evaluate a model's performance, use

python3 train_net_genvis.py --num-gpus 1 \
--config-file YOUR_MODEL_PATH/config.yaml \
--eval-only MODEL.WEIGHTS YOUR_MODEL_PATH/model_checkpoint.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir

Model Zoo

YouTubeVIS-2019

Backbone AP AP50 AP75 AR1 AR10 Download
R-50 50.4 70.7 55.2 48.4 58.7 model
Swin-L 63.1 85.6 67.2 55.5 67.8 model

YouTubeVIS-2021

Backbone AP AP50 AP75 AR1 AR10 Download
R-50 48.9 69.2 53.1 41.8 56.0 model
Swin-L 60.3 81.3 67.1 48.8 64.5 model

YouTubeVIS-2022

Backbone AP AP50 AP75 AR1 AR10 Download
R-50 40.8 60.1 45.9 35.7 46.9 model
Swin-L 52.6 74.0 57.9 45.0 57.1 model

OVIS

Backbone AP AP50 AP75 AR1 AR10 Download
R-50 36.2 60.8 36.8 16.8 40.0 model
Swin-L 45.7 69.1 47.8 19.2 49.4 model

License

The majority of GRAtt-VIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), GENVIS(Apache-2.0 License), and VITA(Apache-2.0 License).

Citing GRAttVIS

If you find GRAtt-VIS useful in your research and wish to refer to the baseline results, please use the following BibTeX entry as a citation.

@article{hannan2023gratt,
  title={GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation},
  author={Hannan, Tanveer and Koner, Rajat and Bernhard, Maximilian and Shit, Suprosanna and Menze, Bjoern and Tresp, Volker and Schubert, Matthias and Seidl, Thomas},
  journal={arXiv preprint arXiv:2305.17096},
  year={2023}
}

Acknowledgement

We acknowledge the following repositories from where we have inherited code snippets.

  1. Detectron2
  2. Mask2Former
  3. VITA
  4. GENVIS