GRAtt-VIS

This is an official Pytorch implementation of GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation. In this repository, we provide PyTorch code for training and testing our proposed GRAtt-VIS model. GRAtt-VIS is an efficient video instance segmentation and tracking model that achieves state-of-the-art results on several benchmarks, such as YTVIS-19/21/22 and OVIS.

Updates

Jun 14, 2023: Code is now available!

Installation

GRAtt-VIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_grattvis.py, that is made to train all the configs provided in GRAtt-VIS. To train a model with "train_net_grattvisvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets. Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python3 train_net_genvis.py --num-gpus 4 \
--config-file configs/genvis/ovis/grattvis_R50_bs8.yaml \
MODEL.WEIGHTS weights/vita_r50_ovis.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir

To evaluate a model's performance, use

python3 train_net_genvis.py --num-gpus 1 \
--config-file YOUR_MODEL_PATH/config.yaml \
--eval-only MODEL.WEIGHTS YOUR_MODEL_PATH/model_checkpoint.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir

Model Zoo

YouTubeVIS-2019

Backbone	AP	AP50	AP75	AR1	AR10	Download
R-50	50.4	70.7	55.2	48.4	58.7	model
Swin-L	63.1	85.6	67.2	55.5	67.8	model

YouTubeVIS-2021

Backbone	AP	AP50	AP75	AR1	AR10	Download
R-50	48.9	69.2	53.1	41.8	56.0	model
Swin-L	60.3	81.3	67.1	48.8	64.5	model

YouTubeVIS-2022

Backbone	AP	AP50	AP75	AR1	AR10	Download
R-50	40.8	60.1	45.9	35.7	46.9	model
Swin-L	52.6	74.0	57.9	45.0	57.1	model

OVIS

Backbone	AP	AP50	AP75	AR1	AR10	Download
R-50	36.2	60.8	36.8	16.8	40.0	model
Swin-L	45.7	69.1	47.8	19.2	49.4	model

License

The majority of GRAtt-VIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), GENVIS(Apache-2.0 License), and VITA(Apache-2.0 License).

Citing GRAttVIS

If you find GRAtt-VIS useful in your research and wish to refer to the baseline results, please use the following BibTeX entry as a citation.

@article{hannan2023gratt,
  title={GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation},
  author={Hannan, Tanveer and Koner, Rajat and Bernhard, Maximilian and Shit, Suprosanna and Menze, Bjoern and Tresp, Volker and Schubert, Matthias and Seidl, Thomas},
  journal={arXiv preprint arXiv:2305.17096},
  year={2023}
}

Acknowledgement

We acknowledge the following repositories from where we have inherited code snippets.

Tanveer81/GRAttVIS