/GenVIS

A Generalized Framework for Video Instance Segmentation

Primary LanguagePython

A Generalized Framework for Video Instance Segmentation

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

[arXiv] [BibTeX]


Updates

  • Jan 20, 2023: Code is now available!

Installation

GenVIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_genvis.py, that is made to train all the configs provided in GenVIS.

To train a model with "train_net_genvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets.

Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  MODEL.WEIGHTS vita_r50_ovis.pth

To evaluate a model's performance, use

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Additional weights will be updated soon!

YouTubeVIS-2019

Backbone Method AP AP50 AP75 AR1 AR10 Download
R-50 online 50.0 71.5 54.6 49.5 59.7 model
R-50 semi-online 51.3 72.0 57.8 49.5 60.0 model
Swin-L online 64.0 84.9 68.3 56.1 69.4 model
Swin-L semi-online 63.8 85.7 68.5 56.3 68.4 model

YouTubeVIS-2021

Backbone Method AP AP50 AP75 AR1 AR10 Download
R-50 online 47.1 67.5 51.5 41.6 54.7 model
R-50 semi-online 46.3 67.0 50.2 40.6 53.2 model
Swin-L online 59.6 80.9 65.8 48.7 65.0 model
Swin-L semi-online 60.1 80.9 66.5 49.1 64.7 model

OVIS

Backbone Method AP AP50 AP75 AR1 AR10 Download
R-50 online 35.8 60.8 36.2 16.3 39.6 model
R-50 semi-online 34.5 59.4 35.0 16.6 38.3 model
Swin-L online 45.2 69.1 48.4 19.1 48.6 model
Swin-L semi-online 45.4 69.2 47.8 18.9 49.0 model

License

The majority of GenVIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), and VITA(Apache-2.0 License).

Citing GenVIS

If you use GenVIS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{GenVIS,
  title={A Generalized Framework for Video Instance Segmentation},
  author={Heo, Miran and Hwang, Sukjun and Hyun, Jeongseok and Kim, Hanjung and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={arXiv preprint arXiv:2211.08834},
  year={2022}
}

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, Deformable DETR, and VITA. We are truly grateful for their excellent work.