google-research/deeplab2

Several Questions on TubeFormer-CVPR-2022.

lxtGH opened this issue · 3 comments

lxtGH commented

We have several detailed questions since we cannot find the code.

1.For VSS task, Is the global memory the prediction kernel of last convolution? Did you use bipartite matching?

2.For VPS task on KITTI-STEP, in the section of “Global memory with split thing and stuff.” Did you use bipartite matching for stuff memory or directly use Cross Entropy Loss?

3.For both VPS and DVPS tasks, we are also confusing on the prediction label range in the section of “Global memory with split thing and stuff.” Dose the mask classification for thing and stuff is performed jointly or individually? (joint classification head for C_{thing}+C_{stuff} or two heads for C_{thing} and C_{stuff} to hand each.)

4.For DVPS task, how did you handle the un-labeled region on KITTI-DVPS since the labels are very sparse?

5,Will the code be released for reference?
Thanks a lot!!!!!

lxtGH commented

Hi! We are big fans of your work. Could you help us to better understand your work? @mcahny @aquariusjay Thanks a lot !!!!!

Hi, thanks for asking.

  • Yes, the global memory (after the last 2 FC layers) is the prediction kernel. We use fixed assignment (instead of bipartite matching).
  • We do not use bipartite matching for stuff classes. The stuff classes are given Cross Entropy and VPQ style losses.
  • As we use the fixed assignment between the stuff memory and stuff classes, the mask classification for thing and stuff is performed individually.
  • We just labeled the unlabeled regions as ‘unlabeled’ and the loss ignores those regions.
  • We are not sure about when yet.

Thanks.

lxtGH commented

Thanks for your reply!! Dr.Dahun @mcahny As I prepare to re-implement your TubeFormer using Pytorch(mmdet). I want to know the details of mask based tracking part. Did you use ViP-like mask based tracking in off-line manner? Looking for you reply!!!