woodfrog/ActionRecognition

optical flow images

Opened this issue · 8 comments

Can you tell me how are you feeding optical images which you got from video frames?
I am just confused.

Hi,
Basically we compute the optical flow of every two ajacent frames and then stack them together to form the optical flow for the whole sequence. So when the number of frames is 10, we will have 9 optical flow images, each of them have the shape (im_size x im_size x 2), so the shape of stacked optical flow image will be (im_size x im_size x 18).

How are you cropping ? If you can just tell me if this is right or wrong it would be a great help.
For Running I need to crop the human and resize it so that in each frame they will look as if they are just at one point and moving their hand and legs only.

For bending, jumping,pulling , handshake I need to take a same boundary for all the frames where the motion is taking place. right ?
image

image

I think the frames in your screenshot are good : ), after you crop the human and resize all frames into the same size, you can compute the stacked optical flow.

Can i implement two stream convolutional neural netowrk on multi view camera video datasets?
The dataset(I3DPost) i am using have taken actions from 8 different angles.
Sorry to ask this silly question. I am newbie in this field.

Camera is fixed. But the object is performing action looking at 8 different directions (facing the camera ,away the camera ,..........etc)

I think the two-stream model assumes the input optical flow to be from a single video sequence. It's not good to stack flows extracted from 8 videos from different angles, since they cannot be considered as a complete sequence.
If you want to use the model on multi-view data, I guess maybe it's a better way to train one model for one view (assume you know which sequence is taken from which view in the dataset), and then use something like a voting mechanism to give the final prediction.

OWH99 commented

Hi,
Basically we compute the optical flow of every two ajacent frames and then stack them together to form the optical flow for the whole sequence. So when the number of frames is 10, we will have 9 optical flow images, each of them have the shape (im_size x im_size x 2), so the shape of stacked optical flow image will be (im_size x im_size x 18).

Hi, may I know what is the exact mechanism you used for stacking? I have generated the images but I am not sure how to stack and input them input the motion model.

wuao commented