florinshen/ULAST

some questions about cycle training in your paper

Closed this issue · 1 comments

It is written in 3.5. Online tracking that a memory queue of length Nl(=6) stored with historical search features and region masks is maintained, including features of initial frame and Nl−1(=5) historical samples of the highest score. Besides, the hidden template Ht is updated every Ns(=10) frames with the highest score in this short interval.

But in 4.1. Implementation details ,it is written just one reliable template frame and three search region frames with large temporal gaps are sampled from a video for cycle training.

So why Nl = 6 and Ns =10 ?Aren't they only valid in these three search frames of cycle training? And I also didn't find that how to get these three frames from a video. Maybe three is just an example ? In other words, many three frames are taken from a video ?

Thank you very much for your answer !

The inference stage is different from the training phase. In Sec 4.1, we elaborate on the inference phase, online tracking utilises the trained CPT module for retrieving an online template kernel. In the training phase, totally of four frames with large temporal gaps are sampled from a single video. As gradients of these frames are maintained in the training phase, 4 is the largest number, occupying 14.7 Memory on each GPU.