Different results when images of person taken at different distances
asimniazi63 opened this issue · 1 comments
Hi,
Yes the system at the time this code was released was quite susceptible to changes in relative person size, because the training inputs were cropped to a bounding box around the synthetic silhouettes/joints (with a small random bbox scaling factor for augmentation).
The standard way to deal with this is to also crop any test inputs around the detected silhouette/joints before 3D prediction to mimic the training data preprocessing, then un-crop after prediction, but it seems like I forgot to implement that in the code released here 😄. I'll get around to it when I've got some time.
You could also try increasing the random bbox scaling range in data augmentation to train the network to be more robust, but the test-time solution makes more sense and will probably work better.