Clarifications for implementation

Question

Clarifications for implementation

Closed this issue 6 years ago · 10 comments

Congrats on the nice work. I have been trying to reproduce your results and implemented the network after building up on the code provided in your pose regression repo . I have a few questions/clarifications. It would be great if you respond to the following-

Can you share the exact details of which portion of dataset you used from mpii for training 2D action recognition.
Can you share the parameters (mean/variance) you used to generate GT heatmaps for pose estimation network?

Thanks,

Answer 1 · 2018-10-10T07:16:54.000Z

Hi @ashar-ali ,

For MPII on single person, there is a standard split between training and validation samples.
I uploaded the file mpii_annotations.mat with this split.
We use soft-argmax to regress directly joint coordinates, so the method does not rely on generate GT heatmaps. Please take a look in the paper for more details.

Best,

Answer 2 · 2018-10-10T18:53:32.000Z

Oh great,

figured out the soft-argmax thing. But can you please re-verify the link you have provided above for file download? It throws a 404: not found error for me when I click on it. also I remember you released the weights for mpii yesterday. But am not able to access them today.

Would be a great help.

Thanks @dluvizon

Answer 3 · 2018-10-10T21:45:33.000Z

The links should work now!

Answer 4 · 2018-10-11T01:06:26.000Z

Great,

Thanks a ton for providing these. Do you also plan to release models for pose estimation and/or actiivity recognition for penn_action dataset anytime soon?

Thanks,

Answer 5 · 2018-10-11T06:51:28.000Z

I am planning to release the weights finetuned for action, for both Penn and NTU.

Answer 6 · 2018-10-11T18:27:19.000Z

Thanks @dluvizon ,

As a sanity check experiment, I was also trying to train the action recognition nets independently for a few epochs.

By independently, I mean I just took the pose ground truth and tried to learn actions with categorical cross-entropy.
Similarly, I extracted the appearance features and pose heat maps offline and tried to learn action categories with hyper parameters mentioned in Appendix B of the paper.

Questions-

For both the above cases, I could only get the accuracy close to ~8% on the training data itself in 3-4 epochs. Is this performance expected, or should I get at least some prior accuracy with this kind of offline independent learning?
Do you suggest it is better to directly jump to learn jointly with pose estimator networks (after 2 epochs) as mentioned in the paper?

P.S.- all the discussion above is based on experiments I did on Penn Action dataset. Features and probability heat maps were extracted from 2d pose estimator (trained on MPII) as provided by you.

Answer 7 · 2018-10-12T13:03:24.000Z

Hi @ashar-ali ,

Considering that PennAction contains 15 classes, you are getting random predictions.
In my experience, after 3-4 epochs it should be close to 80% using visual features and a bit lower using only pose.
If your net is not learning at all, I guess that the problem is not with the pose data.

PennAction is a pretty easy dataset, so even a naive method should attain 80% relatively fast.

Answer 8 · 2018-10-17T18:27:42.000Z

Hi @dluvizon ,

Thanks for sharing these insights. I think it could be also because I am not using any kind of data augmentation for now as I was doing a proof of concept and the architecture is not converging because of that.

Now that you have uploaded the full code for action model as well as weights, I will try and see if I can reproduce its results.

Can you please point me to the annotations.mat file for Penn Action Dataset?
If you could just verify if I am encoding the labels right that would be great-

'baseball_pitch', - 0
'baseball_swing', - 1
'bench_press', - 2
'bowl', - 3
'clean_and_jerk', - 4
'golf_swing', - 5
'jump_rope', - 6
'jumping_jacks', - 7
'pullup', - 8
'pushup', - 9
'situp', - 10
'squat', - 11
'strum_guitar', - 12
'tennis_forehand', - 13
'tennis_serve' - 14

Answer 9 · 2018-10-18T10:48:22.000Z

Hi,

The file should be OK now at https://github.com/dluvizon/deephar/releases/download/v0.3/penn_annotations.mat

You can check the penn action labels by doing:

    print (penn_seq.action_labels)

just after loading the dataset. That gives:

['baseball_pitch' 'baseball_swing' 'bench_press' 'bowl' 'clean_and_jerk'
 'golf_swing' 'jump_rope' 'jumping_jacks' 'pullup' 'pushup' 'situp' 'squat'
 'strum_guitar' 'tennis_forehand' 'tennis_serve']

which corresponds to your list.

Answer 10 · 2018-10-18T20:36:49.000Z

Sounds great,

Thanks a lot for all your help @dluvizon