spatiotemporal behavior detection

Question

spatiotemporal behavior detection

yan-ctrl opened this issue 2 years ago · 10 comments

Hello, thank you for your work. I would like to ask you how to apply this work to the AVA dataset and do spatiotemporal behavior detection.

Answer 1 · 2023-02-10T15:44:04.000Z

Sorry, I have not run AVA. However, I think you can follow VideoMAE to run it. They forked AlphaAction to run AVA. Just copying the model and reusing their repo!

Answer 2 · 2023-02-12T15:40:05.000Z

good Thank you for your recommendation, but I'm afraid I don't have enough GPUs to run video MAE.

Answer 3 · 2023-02-13T01:34:15.000Z

Yes. My suggestion is that you can copy the UniFormer model to run it. Just like how to use MMDetection/MMsegmentation...

Answer 4 · 2023-02-13T02:53:17.000Z

Oh, you mean let me pre-train the model in your work or VideoMAE, and then fine-tune my own model in the AlphAction library.

Answer 5 · 2023-02-13T07:14:36.000Z

Yes. The above repo is based on AlphAction and you can reuse their hyperparameters for transformer-based models. If you want to use UniFormer or other efficient backbones, you can transfer your model code to that repo like here (you may need to add ROIPooling).

Answer 6 · 2023-02-13T13:38:00.000Z

Thank you for your patience. But I still have questions for you https://github.com/MCG-NJU/VideoMAE-Action-Detection , although AlphAction is used, the pre-training models provided are based on ViT, and SlowFast is not used as the backbone network, so:

The function of AlphAction is just to detect the head, right? All models need to be based on ViT.
VideoMAE-Action-Detection/modeling_ Finetune.py, if I use other backbones, how can I conduct MAE training.

Answer 7 · 2023-02-14T04:29:09.000Z

The repo is used for training an Action Detection model based on Kinetics-pretrained models.
Your original problem is how to apply UniFormer to the AVA dataset. In my opinion, you can reuse their repo, and add UniFormer model. Why do you want to conduct MAE training?

Answer 8 · 2023-02-14T05:35:20.000Z

Well, because I want to apply it to my own tasks, I need to customize data sets similar to AVA format, so labeling is troublesome. I can't label data sets as large as AVA. I want to see if self-supervised learning can help. So, as you said, I can train the parameters of the backbone network based on the Kinetics data set and migrate to the downstream tasks. But MAE is based on ViT as the backbone network, and your work is also a good backbone network, So I asked you if AlphAction only plays the role of using motion detection to evaluate the MAE model, like the classifier of image segmentation network, or as you said, using the UniFormer model, and then reuse AlphAction repo.

Answer 9 · 2023-02-14T07:01:27.000Z

Q: "So I asked you if AlphAction only plays the role of using motion detection to evaluate the MAE model."
A: AlphAction is a general codebase for training action detection models. It's not only used for the MAE model. You can use other models as backbones.

Answer 10 · 2023-02-14T18:21:29.000Z

Ok, thank you for your patience. I see