/2022CVPR-MMMMTBVS

This is the code for CVPR2022 paper "Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation"

2022CVPR-Modeling-Motion-with-Multi-Modal-Features-for-Text-Based-Video-Segmentation

This is the code for CVPR2022 paper "Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation"

πŸ”₯πŸ”₯πŸ”₯Coming SoonπŸ”₯πŸ”₯πŸ”₯

Usage

  1. Download A2D-Sentences and JHMDB-Sentences.

  2. Please use RAFT to generate the opticla flow map for each frame.

  3. Put them as follows:

your dataset dir/
└── A2D/ 
    β”œβ”€β”€ allframes/  
    β”œβ”€β”€ allframes_flow/
    β”œβ”€β”€ Annotations_visualize
    β”œβ”€β”€ a2d_txt
        └──train.txt
        └──test.txt
└── J-HMDB/ 
    β”œβ”€β”€ allframes/  
    β”œβ”€β”€ allframes_flow/
    β”œβ”€β”€ Annotations_visualize
    β”œβ”€β”€ jhmdb_txt
        └──train.txt
        └──test.txt

"Annotations_visualize" contains the GT masks for each target object. We have upload them to BaiduPan(lo50) for convenience.

Train

Comming Soon

Inference

Comming Soon

Citation

Please consider citing our work in your publications if you are interest in our research:

Comming Soon