woodfrog/ActionRecognition

About data preprocessing

Closed this issue · 1 comments

Hi, I am interested in your repo ActionRecognition, and trying to run codes.
However, in you 'experiment.md' file, about Preprocessing

3. Segment number of frames into equal size blocks(frame number/sequence
length L). Randomly select one frame from each block to compose L length
video clip

Is this that if there are 35 frames for one video, then if the L is 5,
it is divided into 1-5, 6-10 , ... , 31-35, which is 7 blocks and get
random 7 frames which are each from first to last blocks then make 7 frames video?
If my understanding is right, I'd like to know why doing this.
Also, is there any way to get sequence length L? Is it fixed? which number you chosed?

p.s. If possible, can you let me know the threshold for discarding videos with too few frames in below description.

2. Extract video to 5 FPS and down sample resolution for each video
and discard videos with too few frames

Hello, thank you for your interests 😄.

L is the length of each video clip after preprocessing, and we set it as a fixed number for simplicity. In your example, the original video contains 35 frames (with FPS=5) will be divided into 5 chunks, and each chunk will contain 7 frames, then we randomly pick one frame in each chunk to compose a 5-frame video clip. We chose L=10 in our experiments.

As for the threshold, if L=10, then an original video containing less than 10 frames (after converting to FPS=5) will be discarded.