UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Official implementation of UniRef++, an extended version of ICCV2023 UniRef.

Highlights

UniRef/UniRef++ is a unified model for four object segmentation tasks, namely referring image segmentation (RIS), few-shot segmentation (FSS), referring video object segmentation (RVOS) and video object segmentation (VOS).
At the core of UniRef++ is the UniFusion module for injecting various reference information into network. And we implement it using flash attention with high efficiency.
UniFusion could play as the plug-in component for foundation models like SAM.

Schedule

Results

video_demo.mp4

Referring Image Segmentation

Referring Video Object Segmentation

Video Object Segmentation

Zero-shot Video Segmentation & Few-shot Image Segmentation

Model Zoo

The results are reported on the validation set.

Model	RefCOCO	FSS-1000	Ref-Youtube-VOS	Ref-DAVIS17	Youtube-VOS18	DAVIS17	LVOS	Checkpoint
UniRef++-R50	75.6	79.1	61.5	63.5	81.9	81.5	60.1	model
UniRef++-Swin-L	79.1	85.4	66.9	67.2	83.2	83.9	67.2	model

Installation

See INSTALL.md

Getting Started

Please see DATA.md for data preparation.

Please see EVALUATION.md for evaluation.

Citation

If you find this project useful in your research, please consider cite:

@article{wu2023uniref++,
  title={UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces},
  author={Wu, Jiannan and Jiang, Yi and Yan, Bin and Lu, Huchuan and Yuan, Zehuan and Luo, Ping},
  journal={arXiv preprint arXiv:2312.15715},
  year={2023}
}

@inproceedings{wu2023uniref,
  title={Segment Every Reference Object in Spatial and Temporal Spaces},
  author={Wu, Jiannan and Jiang, Yi and Yan, Bin and Lu, Huchuan and Yuan, Zehuan and Luo, Ping},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2538--2550},
  year={2023}
}

Acknowledgement

The project is based on UNINEXT codebase. We also refer to the repositories Detectron2, Deformable DETR, STCN, SAM. Thanks for their awsome works!

joskid/UniRef