microsoft/computervision-recipes

Negative sample training in action recognition and also how can I integrate detectron_2 or any object detection so that i make model more focus on person action.[FEATURE_REQUEST]

vigneshgig opened this issue · 5 comments

Description

Hi , Thanks for the great work. I have trained the model with my own dataset which has just 15 classes and every 15 classes contain 1 hrs positive clip. Then I started training with full network config. I tested it, the result was poor. Then again I trained with only last layer config. Now I able to find some good results. So Now I planning to increase the dataset, but the problem is from the starting of the video with no action it starts to predicting some random classes. but with an already pretrained model, it's able to somewhat reject the unwanted scene. So If you used a negative clip to reject the unwanted scene. Please let me know how to train with a negative clip or any other solution to reject the unwanted scene.
Second question:
How can I add pre-trained object detection So that we can make the model to focus on individual person action or to focus both the person and object required to perform a particular action?

For example:
I have seen this below great repository:
https://github.com/facebookresearch/SlowFast

any idea or suggestion. It will be much helpful,

Thanks
ava_demo

One way forward could be to add a new class, called e.g. "negative", and add a large number of clips in there from similar videos which however do not include the actions you are interested in.
As for person detection, consider using this pre-trained person detector to crop the frames to the area of interest: https://github.com/microsoft/computervision-recipes/tree/master/scenarios/keypoints

Thanks for the reply @PatrickBue
I will try your suggestion but, I just want to know how pre-trained is rejecting unwanted scenes. can you tell me is there any algorithm or else just we have to train the model with more dataset to neglect the unwanted scene while prediction. because
When I tested the pre-trained model with a webcam . the model didn't predict anything while I standing still with no action. but when I tested the own model which was I trained with my own dataset, it predicting some random classes continuously even though I standing still with no action . but it able to predict the correct action if I am doing some action like sneezing, jumping.

Your best approach would be to add more (negative) training examples. You can of course also try to reduce over-fitting (e.g. as you did by only training the last layer) but there is no guarantee that this will indeed make the model over-fire less.

Thanks, Ok I try it..

Closing due to inactivity.