researchmm/Stark

lighting model score head

elvindp opened this issue · 1 comments

Hi, thanks for your great work! I have some questions about the code:
Becuase without the score, if the target is out of FOV (i.e. out of image), the lightning model will always product a wrong output coordinate.

  1. lightning model does not have score head. Is this because that the performance is bad? Actually, when I add score cls head to lightening model and train 50 epoches, the performance is quite bad, but the resnet 50 model performance is good. Is this becuase lightening model is too thin?
  2. Why just train cls head weight in stark-st stage 2 training?
  3. Why do "sample target" on the complete image? I save the image, and find so many black padding region.

@elvindp Hi, thanks for your appreciation of our work.
Q1: why doesn't STARK-Lightning have score head?
A1: First I want to explain why we choose to remove the decoder. The first reason is that the 6-layer decoder takes quite a long time and introduces a large number of parameters. The second reason is that the performance doesn't drop obviously when removing the decoder for STARK-S. (The qualitative comparison is shown in the image below.)
decoder_comparison
Besides, the score prediction highly relies on the decoder. After removing the decoder, the input of the score head becomes the original target query, which doesn't see any information about the template and the search region. So as you said, the performance is quite bad in this situation.
Q2: Why just train cls head weight in stark-st stage 2 training?
A2: We take the cls head as an additional component to the original STARK-S because "accurate box estimation"(box head) is more important than "determining whether the target is lost"(cls head) for SOT. If we train the whole network in an end-to-end fashion, these two heads may cause conflict.
Q3: Why do "sample target" on the complete image?
A3: This part is actually as same as that in PyTracking. When the tracked target is large, there would be large proportion of padding (black) regions. That is a normal phenomenon. Don't worry about it.