Search factor for MixFormerV2 during training and testing

Question

Search factor for MixFormerV2 during training and testing

Kou-99 opened this issue a year ago · 4 comments

Thanks for your great work! I notice that the training search factor in yaml file for MixFormerV2-S is 4.5 while the testing search factor is 4.4 on LaSOT and 4.55 for Got10k and TrackingNet. Is there a reason for such inconsistency and will the mismatched factor affect the performance? Additionally, the center jitter for MixFormerV2-S is 4.5 (same as the search factor) will this cause more partially visible targets during training and reduce performance, considering MixFormerV2-S uses a smaller center jitter (4.5) and a larger search factor (5)?

Answer 1 · 2023-08-15T14:46:28.000Z

The search factor is a hyper parameter in both training and testing, well as center jitter. We choose the training hyper parameter according to the teacher model setting. The search factors in training and testing are not necessarily to be the same, even not necessarily to be fixed. Search factor only determines how to crop the test frame according to the result box. It may blend the performance slightly but is usually not the key point to the performance level. For specific videos, it can be tuned for a suitable one. It is also even possible to design a smart strategy for search factor during inference, such as dynamic adaptive method.

Answer 2 · 2023-08-15T15:09:15.000Z

The search factor is a hyper parameter in both training and testing, well as center jitter. We choose the training hyper parameter according to the teacher model setting. The search factors in training and testing are not necessarily to be the same, even not necessarily to be fixed. Search factor only determines how to crop the test frame according to the result box. It may blend the performance slightly but is usually not the key point to the performance level. For specific videos, it can be tuned for a suitable one. It is also even possible to design a smart strategy for search factor during inference, such as dynamic adaptive method.

Thanks for your quick reply! To make sure I get everything right, I list my understanding below:

The student share the same set of hyper-parameter with the teacher
During training, for Base: template/ search size: 128/ 288, template/ search factor: 2/ 5
During training, for Small: template/ search size: 112/ 224, template/ search factor: 2/ 5
During testing, search factor are carefully tuned for each dataset whereas the search size is fixed to the size used in training

Please correct me if anything is wrong, thanks!

Answer 3 · 2023-08-15T15:19:13.000Z

Yes. Except for the Small, search factor is 4.5 during training.

Answer 4 · 2023-08-15T15:23:09.000Z

Get it. Thanks for your reply! Good luck with your submission!