Extension to Video Datasets
kanji95 opened this issue · 1 comments
kanji95 commented
How do we extend x-decoder to video datasets? In the appendix, it is mentioned that the model generalizes to generic segmentation and referring segmentation on videos.
MaureenZOU commented
Thanks for your question, the evaluated video dataset is simply evaluated frame by frame. For referring segmentation, the referring phrase is a natural tracking id, for segmentation we didn't apply any tracking.