DirtyHarryLYL/HAKE-Action-Torch

How to train a common dataset?

xudif opened this issue · 6 comments

xudif commented

Dear authors:
Hello, I want to train my own dataset on this network structure and corresponding pre-trained model. How can I fineturning custom part state tags and human activities on this network with a dataset labeled in COCO format?

hwfan commented

Thanks for playing with Activity2Vec!
Does "a dataset labeled in COCO format" mean a dataset with human boxes and keypoints or an action dataset organized like COCO?
For the implementation of the custom dataset, you can follow the dataset definition in activity2vec/dataset/hake_dataset.py, which includes the interfaces of image and annotations. The generation procedure of part box is given in tools/inference_tools/part_box_generation.py.

xudif commented

Thanks for the response, In my own dataset, the people and related object boxes are in COCO format, which is used to detect people and related objects in raw image, and the part state tags and human activities are made as json files for storage. For the implementation of the custom dataset, do you mean according to hake_dataset.py write new dataloader files that input images and annotations in the same format for target detection? To predict different part state tags and human activitie, in addition to changing the parameters of part_box_generation.py, do I need to modify other codes?

hwfan commented

For convenience, if you want to build your own dataloader, you can transform your data format to the original interface of HAKE dataset. Besides the definition of part box, you may want to change these codes/configs:

  1. The config of part state and activity numbers. See the config DATA in configs/a2v/verb.yaml and activity2vec/ult/config.py.
  2. The precomputed loss weights since it is generated from the statistics of our HAKE dataset. See the function loss_reweight in activity2vec/ult/misc.py.
xudif commented

Thank you for your patient reply, I think I know what to do. I have another question. I found in the code that the language model and PaStaNet*-GCN is not used in model training and inference. Is it because it is not helpful for the classification of part state tags and verbs?

Yeah, in this version the GCN is not used. We found that the current part state features perform comparably on GCN and the simpler linear FC. So we just use the concise FC as the mapping.
We are working on feature learning and now have found some interesting results. Maybe this summer, we will upgrade the model to provide a more powerful backbone and classifier.

xudif commented

Wow, I am looking forward to your new paper and upgrade to it, good wishes.