max_seq_len during inference
Jaswar opened this issue · 2 comments
Hi, I noticed that during inference the videos are padded to the length indicated by max_seq_len
here. I wanted to ask, why is this padding happening? Would it be sufficient to pad like it is done in the else
case?
To give some context, I am attempting to measure ActionFormer's inference performance with videos of different lengths (similar to what you measured in table 3b but for different feature lengths). As videos smaller than max_seq_len
are padded to be of size max_seq_len
, all of them take the exact same amount of time. I would hence like to simply lower max_seq_len
in thumos_i3d.yaml
to lowest allowable value (576) and then pass videos of sizes from 576 to 2304.
I have passed an example video of length 915 (video_validation_0000990.npy
) with both configurations (max_seq_len
set to 576 and 2304), which results in padded shapes of 1152 and 2304 respectively. The resulting output of the network is the same for both cases. Hence my question, is padding to max_seq_len
necessary (at least in the case of THUMOS dataset)?
Good point. I don't think padding to max_seq_len
is necessary at inference time. You will still need to pad to a divisible size (see here). The results on the test set should be similar if not exactly the same.
Okay, that makes sense. Thank you for the quick reply.