RobertLuo1/NeurIPS2023_SOC

About class prediction

Choi58 opened this issue · 2 comments

Choi58 commented

Deer authors

I downloaded the scratch_tiny.tar file, which contains weights trained from scratch on ytvos.

and I conducted an evaluation for ref-davis, then the following error occurred.

RuntimeError: Error(s) in loading state_dict for SOC: size mismatch for class_embed.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in the current model is torch.Size([1, 256]). size mismatch for class_embed.0.bias: copying a param with shape torch.Size([]) from checkpoint, the shape in the current model is torch.Size([1]). size mismatch for class_embed.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in the current model is torch.Size([1, 256]). size mismatch for class_embed.1.bias: copying a param with shape torch.Size([]) from checkpoint, the shape in the current model is torch.Size([1]). size mismatch for class_embed.2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in the current model is torch.Size([1, 256]). size mismatch for class_embed.2.bias: copying a param with shape torch.Size([]) from checkpoint, the shape in the current model is torch.Size([1]).
The issue is that the num_classes in davis.yaml is different from the one in the class_embed layer.

This is due to predicting multiple classes in class_embed rather than just one class

Did you utilize class information during training, such as tiger or person?

Thank you.

Thanks for your attention on our works!
We feel sorry about providing less information on Ref-DAVIS. First, we use the class information when training Video-Swin-T model on Ref-Youtube-VOS, which is mentioned in training section in README. As for Ref-DAVIS dataset, we directly use the checkpoint trained on Ref-Youtube-VOS and test on it withour any finetuning. If the checkpoint is trained with multi-class, you need to use the commented code in ./infer_davis.py line 149-153.
When performing With-Pretrain or Joint-Training, we set the num_class=1 and codes in line 149-153 can be ignored. Actually, we find that the multi-class training does not contribute a lot. You can feel free to train it by setting the num_class as 1 when training on Ref-Youtube-VOS from scratch.

Choi58 commented

Your response is sufficient!

thank you