microsoft/X-Decoder

Extension to Video Datasets

kanji95 opened this issue · 1 comments

How do we extend x-decoder to video datasets? In the appendix, it is mentioned that the model generalizes to generic segmentation and referring segmentation on videos.

Thanks for your question, the evaluated video dataset is simply evaluated frame by frame. For referring segmentation, the referring phrase is a natural tracking id, for segmentation we didn't apply any tracking.