exiawsh/StreamPETR

Question about training frames in Table 5 and Table 6 of the paper

Closed this issue · 6 comments

Hi! I'm very interested in your excellent work.

I have questions about training frames in Table 5 and Table 6 of the paper.
For example, when training frames=8 and Test=V in Table 5, is it mean training frames=batchsize (2) * len_memory (512) // num_propagated (128)?
In addition to Table 6, when number frames=4, is it indicates number frames=len_memory (512) // num_propagated (128)?

Please point out my mistake if I take a misunderstanding.

Looking forward to your reply~ Thank you~

Hi,
In table 5, we ablate the training frames,
We have two kinds of training method.

  1. The sliding window method, you can set the training frames here:
    queue_length = 8 # sliding window training, set seq_mode = False in dataset
  2. The streaming method,
    the training frames = 40 (frames in 1 scene in nusc) // seq_split_num
    seq_split_num=2, # streaming video training

    In table 6, we ablate the length of the memory bank,
    number frames for memory bank = len_memory (512) // num_propagated (128)
    Please note that the number of frames in the memory bank is not the total length of temporal fusion. Because we use a recurrent approach with latent variables (query features) to propagete temporal information, although the memory bank only has 4 frames, it can be extended to a longer sequence.

Thanks for your quick reply!

Got it!!

SO if I want to reproduce the results in the Table 5 (i.e., the streaming method, training frames=4), I should set seq_split_num=10.

BTW, I want to know in Table 1 of the paper, which setting the results with ResNet-50 are reported on? using streaming setting? or using sliding window?

Yes. Note, we only provide one way to test. That is, test directly on the entire video (40 frames).

What we provide in this paper is the sliding window method That's an earlier version. But we switch to streaming method now.

Got it!! Appreciate for quick reply~