How to select correct NUM_FRAMES and SAMPLING_RATE for a different frame rate?
qwangku opened this issue · 4 comments
Thanks for sharing this great resources. Just want to confirm if my understanding about NUM_FRAMES=32 and SAMPLING_RATE=2 is correct. I hope to reduce the "frame rate" for my application to see how bad the prediction performance will become, so I found these two parameters in the yaml file.
For NUM_FRAMES=32, does that mean dataloader will pick every 32 frames from the entire video clip if the video was saved as 320 frames? For example, the selected indices will be 1, 32, 64, 96, 128, ... , 320 (Totally 32 frames)
But if I change NUM_FRAMES=16, the data loader will pick every 64 frames, and selected frames will be 1, 64, 128, 192, ... 320 (Totally 16 frames, smaller than 32, but still represent the entire video [1 ~ 320]). Is my understanding correct?
However, what is this SAMPLING_RATE=2 in this setup? Could someone provide some guidance on this?
Hello,
If I've understood the code correctly, at training and validation time, NUM_FRAMES
will be the number of consecutive frames that will be taken from a video clip. If SAMPLING_RATE>1
, then the actual total number of frames will be NUM_FRAMES*SAMPLING_RATE
.
So, as an example, imagine you have a video clip that is 320 frames total. If NUM_FRAMES=32
and SAMPLING_RATE=2
, during training and validation the data loader will:
- randomly select an index between
0
and320-(NUM_FRAMES*SAMPLING_RATE)
as the start frame, let's say103
in this example. - select
SAMPLING_RATE*NUM_FRAMES
starting at frame 103 You would get frames103,104,105,...,166
Note: all of this assumes that the original video clip had a sampling rate of 60fps
.
Maybe someone from the EK team can confirm? @ekazakos
actually, sorry, but I forgot one step:
- In the example I described, you will actually get
NUM_FRAMES
uniformly spaced between frame indices103
and166
.
I hope this helps (and that I'm actually not wrong somehow).
Hi,
@iranroman's interpretation is perfectly correct. The dataloader samples a chunk of NUM_FRAMES*SAMPLING_RATE
consecutive frames from which NUM_FRAMES
equidistant frames are fed into the model sampled with SAMPLING_RATE
.
actually, sorry, but I forgot one step:
- In the example I described, you will actually get
NUM_FRAMES
uniformly spaced between frame indices103
and166
.I hope this helps (and that I'm actually not wrong somehow).
TH
Hello,
If I've understood the code correctly, at training and validation time,
NUM_FRAMES
will be the number of consecutive frames that will be taken from a video clip. IfSAMPLING_RATE>1
, then the actual total number of frames will beNUM_FRAMES*SAMPLING_RATE
.So, as an example, imagine you have a video clip that is 320 frames total. If
NUM_FRAMES=32
andSAMPLING_RATE=2
, during training and validation the data loader will:
- randomly select an index between
0
and320-(NUM_FRAMES*SAMPLING_RATE)
as the start frame, let's say103
in this example.- select
SAMPLING_RATE*NUM_FRAMES
starting at frame 103You would get frames103,104,105,...,166
Note: all of this assumes that the original video clip had a sampling rate of
60fps
.Maybe someone from the EK team can confirm? @ekazakos
Hello,
Thanks for the explanation, I'm looking for running inferences using this model, do you have a script for inference only? I would deeply appreciate if you can help me out with this.