wudongming97/RMOT

About the initialization of detect query.

Closed this issue · 4 comments

Hi, I just have a small question. Did you use the text feature to initialize the detect query in the decoder?
https://github.com/wudongming97/RMOT/blob/cb5fd35364a078f355c102358fddc682f37af786/models/transrmot.py#L648C1-L648C1

No, I don't use the text feature to initialize the detect query in the decoder. But I have tried this method, which has no performance improvement.

I'm sorry, my mistake. I misinterpreted the order of the parameters, leading me to believe you were initializing detect query with text features.

Another small issue. I received the following warning during training. Is this normal, or does it indicate that I failed to load the pretrained deformable detr model and the roberta model successfully?

No param track_embed.self_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.self_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param text_encoder.encoder.layer.1.attention.output.LayerNorm.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param text_encoder.encoder.layer.1.intermediate.dense.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.

However, I think I have loaded the right pretained model r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint.pth from Deformable-DETR and use the below code to load roberta.

RMOT/models/transrmot.py

Lines 472 to 473 in cb5fd35

# self.tokenizer = RobertaTokenizerFast.from_pretrained(text_encoder_type)
# self.text_encoder = RobertaModel.from_pretrained(text_encoder_type)

The loading is correct.

  1. track_embed is not defined in Deformable DETR.
  2. text_encoder is from ReferFormer, which has no problem.

Thank you so much! Now I can just ignore these warnings.