Is XMem class agnostic?
Closed this issue · 1 comments
Does the model doesn't care the value of the class index?
Either during training or testing, I see that in code, we convert the mask to continuous index. For example if my dataset have classes 1,2,3,4,5,6 and one of the video lets say I have only class 4 and 6. During training the dataset is converting the class index to 1 and 2 instead of 4 and 6. Same during testing, where MaskMapper converts non-continuous class indices to continuous values.
So technically, if I fine-tune the model for my data, all it learns is the general idea of data and tracking the objects in my data, rather than learning any particular labels, since the model never have any fixed idea of labels, where class 4 in one video can be converted to id=1 and in another video class 4 can be converted to id=2.
If this is the case, how does the model work when there is a new class coming in between the video, whose mask was not present in the first mask both during training and inference?
Finally, what is the use of CE Loss (bootstrapped or normal), since the object to class_id is never the same through out the dataset. Ex: class DOG can be id 1 in one video and class DOG can be id 5 in another.
It is class-agnostic -- that's how it generalizes to unseen object categories. Without user input, it would not be able to segment a new object in the middle of the video. For class-specific algorithms, you can look at Video Instance Segmentation.
Cross-entropy is used the same way. We select the ground-truth objects in the same way that we select target objects.