[CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Primary LanguagePython