MCG-NJU/VideoMAE

BUG: Incorrect temporal indexing?

rosenfeldamir opened this issue · 0 comments

In this function (loadvideo_decord), the function samples frames from the video using the clip length and the frame_sample rate.
The beginning of the clip is randomized. Lets say for simplicity that the first frame is 0.
Also, assume the clip length is 4 and the frame_sample_rate is 6.
I expect to get frames 0,6,12,18.
However, I get frames 0,8,16,24, which means the effective frame_sample_rate is 8!

def loadvideo_decord(self, sample, sample_rate_scale=1):

This also happens for the more "conventional" example of frame_sample_rate = 4 and clip_len=16, as used in the script for vit_large.

Here, np.diff(index) returns array([4, 4, 4, 5, 4, 4, 4, 5, 4, 4, 4, 5, 4, 4, 4]), because the code attempts to get 16 frames from a range of 64 frames; whereas it should really get it from 60 frames.
I suggest fixing this by changing the line
converted_len = int(self.clip_len * self.frame_sample_rate)
to converted_len = int((self.clip_len-1) * self.frame_sample_rate)
This is at the very core of VideoMAE. Please correct me if I'm wrong or misunderstood something.