princeton-vl/pose-hg-demo

What is the different between valid-ours.h5 and valid-example.h5

Opened this issue · 5 comments

Hi, I notice that valid-example.h5 is better than valid-ours.h5. Would you please tell me the difference between these two results? Thank you.

Moreover. I found that the released results on validation set is slightly worse than the results on the TEST set:

method Head Shoulder Elbow Wrist Hip Knee Ankle Mean
valid-example 95.80 94.21 87.40 82.75 86.03 81.83 78.32 86.76
valid-ours 95.94 94.68 88.53 83.38 87.48 83.09 79.05 87.56

Is the released model different against the model in your arxiv paper? Thanks in advance!

valid-ours.h5 is currently a bit outdated as well as the released model. With that said, I generally found validation performance to be around 2% worse than test set performance.

valid-example.h5 is the file that gets written to when running the validation demo code, so if someone runs with a different model it was just a way to distinguish it from our baseline performance. I just put an arbitrary set of predictions as filler there so if the evaluation code was run it would show different curves. I didn't think much about it at the time and in hindsight I realize that might be a bit confusing, sorry about that!

@anewell Can you please clarify which model architecture was used to generate valid-ours.h5? Was it the 8-stack hourglass used to produce Table 2 in your paper? I wish to use these results as a reference point during development.

Hi, valid-ours.h5 is from an old version of the model, and does not correspond to the output of the 8-stack network. Sorry for any confusion that might cause. You can download the 8-stack model though and run the evaluation to see how it does.

Thanks for your reply. I've found a small bug in the evaluation code (see #14), but other than that I am able to generate validation set predictions just fine.

Would you consider accepting a PR to add the validation set predictions into the preds/ folder (eg as valid-hg8.h5 or perhaps replacing valid-ours.h5)? It would be very helpful to have an "official" validation prediction set for your highly influential model, and would clear up confusion for people who wrongly believe that valid-ours.h5 represents your peak performance.