A Generalized Framework for Video Instance Segmentation

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

[arXiv] [BibTeX]

Updates

Jan 20, 2023: Code is now available!

Installation

GenVIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_genvis.py, that is made to train all the configs provided in GenVIS.

To train a model with "train_net_genvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets.

Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  MODEL.WEIGHTS vita_r50_ovis.pth

To evaluate a model's performance, use

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Additional weights will be updated soon!

YouTubeVIS-2019

Backbone	Method	AP	AP50	AP75	AR1	AR10	Download
R-50	online	50.0	71.5	54.6	49.5	59.7	model
R-50	semi-online	51.3	72.0	57.8	49.5	60.0	model
Swin-L	online	64.0	84.9	68.3	56.1	69.4	model
Swin-L	semi-online	63.8	85.7	68.5	56.3	68.4	model

YouTubeVIS-2021

Backbone	Method	AP	AP50	AP75	AR1	AR10	Download
R-50	online	47.1	67.5	51.5	41.6	54.7	~~model~~
R-50	semi-online	46.3	67.0	50.2	40.6	53.2	model
Swin-L	online	59.6	80.9	65.8	48.7	65.0	~~model~~
Swin-L	semi-online	60.1	80.9	66.5	49.1	64.7	~~model~~

OVIS

Backbone	Method	AP	AP50	AP75	AR1	AR10	Download
R-50	online	35.8	60.8	36.2	16.3	39.6	model
R-50	semi-online	34.5	59.4	35.0	16.6	38.3	model
Swin-L	online	45.2	69.1	48.4	19.1	48.6	~~model~~
Swin-L	semi-online	45.4	69.2	47.8	18.9	49.0	~~model~~

License

The majority of GenVIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), and VITA(Apache-2.0 License).

Citing GenVIS

If you use GenVIS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{GenVIS,
  title={A Generalized Framework for Video Instance Segmentation},
  author={Heo, Miran and Hwang, Sukjun and Hyun, Jeongseok and Kim, Hanjung and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={arXiv preprint arXiv:2211.08834},
  year={2022}
}

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, Deformable DETR, and VITA. We are truly grateful for their excellent work.

wangbo-zhao/GenVIS