una-dinosauria/3d-pose-baseline

Fine-tuning SH procedure question

maf2418 opened this issue · 6 comments

Hi I was trying to reproduce an element of your research. In looking at the fine-tuned SH data you provide, I notice the mean pixel error of the SH for subject 11 is better than the error of the training set, while subject 9 is about 4 pixels worse. I am slightly surprised 11 is so good and I wanted to confirm whether the fine-tuning was done on just the training subjects or the full dataset? (Normally I would expect the former, but I guess the latter might also have been done since the paper focuses on lifting accuracy). Clarifying that would help my understanding.
cheers

Thanks for your question. We only measured these errors during validation, and we used the same train/test partitions for fine-tuning SH as for the 2d->3d part.

I'm a bit confused by your question though. As far as I remember, the test set consists of subjects 9 and 11, so what exactly is surprising about one being better and one worse? In my head that sounds pretty intuitive.

sorry bad typo on my part, I meant S11 has smaller errors than the training set ( I mistakenly wrote test set, sorry it was late!, will edit my question to not confuse later readers).
Thanks for clarifying what you did. When I tried fine tuning I ended up with my training detections much more accurate than my validation detections, which creates problems when doing the lifting. I suspect I overlearned the training data in the fine tuning part, wheras your detector seems to do nearly as well on S9/S11 as training.
I did not follow your exact methodology though so I will try fine tuning again. Thanks again for the prompt clarification. cheers

You may find #20 (comment) useful to reproduce our SH fine-tuning.

Cheers,

Just coming back a week later to comment I resolved the problem. Boneheaded mistake on my part!I thought your provided dataset "StackedHourglass" was the fine tuned set so I was looking at the wrong data... I did not realize you had a separately provided "StackedHourglassFineTuned240" set. Obviously the clean stacked hourglass set should show comparable results for the testing and validation sets since it was never tuned on Human3.6m data, and as I expected the FineTuned dataset does better on training than the validation.

Just reporting back for politeness and not to confuse anybody else in the future.
cheers,
Martin

Oh thanks so much for clearing that up! :)

Hi I was trying to reproduce an element of your research. In looking at the fine-tuned SH data you provide, I notice the mean pixel error of the SH for subject 11 is better than the error of the training set, while subject 9 is about 4 pixels worse. I am slightly surprised 11 is so good and I wanted to confirm whether the fine-tuning was done on just the training subjects or the full dataset? (Normally I would expect the former, but I guess the latter might also have been done since the paper focuses on lifting accuracy). Clarifying that would help my understanding.
cheers

Hi maf2418, I also applied the fine-tuned SH data from "https://drive.google.com/open?id=0BxWzojlLp259S2FuUXJ6aUNxZkE" for many times, but never get reply. Could you share the data with me? Or could you please teach me what to write when you apply? Thank you in advance.