MPVSS: Mask Propagation for Efficient Video Semantic Segmentation

[NeurIPS 2023] This is the official repository for our paper: MPVSS: Mask Propagation for Efficient Video Semantic Segmentation by Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang and Bohan Zhuang.

Introduction

We have presented a simple yet effective mask propagation framework, dubbed MPVSS, for efficient VSS. Specifically, we have employed a strong query-based image segmentor to process key frames and generate accurate binary masks and class predictions. Then we have proposed to estimate specific flow maps for each segment-level mask prediction of the key frame. Finally, the mask predictions from key frames were subsequently warped to other non-key frames via the proposed query-based flow maps.

Installation

See installation instructions for mask2former.

Data preparation

Download vspw dataset from https://www.vspwdataset.com/
Create link to the dataset

ln -s /path/to/your/dataset datasets/vspw

Train and Evaluation

sh run.sh

Experimental Results

VSPW

Backbone	mIoU	WIoU	VC_8	VC_16	GFLOPs	#Params	FPS
R50	37.5	59.0	84.1	77.2	38.9	84.1	33.93
R101	38.8	59.0	84.8	79.6	45.1	103.1	32.38
Swin-T	39.9	62.0	85.9	80.4	39.7	114.0	32.86
Swin-S	40.4	62.0	86.0	80.7	47.3	108.0	30.61
Swin-B	52.6	68.4	89.5	85.9	61.5	147.0	27.38
Swin-L	53.9	69.1	89.6	85.8	97.3	255.4	23.22

Cityscapes

Backbone	mIoU	GFLOPs	#Params (M)	FPS
R50	78.4	173.2	84.1	13.43
R101	78.2	204.3	103.1	12.55
Swin-T	80.7	175.9	114.0	12.33
Swin-S	81.3	213.2	108.0	10.98
Swin-B	81.7	278.6	147.0	9.54
Swin-L	81.6	449.5	255.4	7.24

If you find this repository or our paper useful, please consider cite:

@inproceedings{weng2023mask,
  title={Mask Propagation for Efficient Video Semantic Segmentation},
  author={Weng, Yuetian and Han, Mingfei and He, Haoyu and Li, Mingjie and Yao, Lina and Chang, Xiaojun and Zhuang, Bohan},
  booktitle={NeurIPS},
  year={2023}
}

Acknowledgement

The code is largely based on Mask2Former. We thank the authors for their open-sourced code.

ziplab/MPVSS