philgras/neural-head-avatars

tracking result not right

Closed this issue · 4 comments

Hi, the tracking visual seems abnormal, can u pointed my out why/

I set the tracking resolution to 512x512

image

Hi, sorry for the confusion. The parameter tracking_resolution should match the aspect ratio of the input frames. It seems like your frames are not squared but rectangular (landscape). If you have enough GPU memory resources and want to track in full resolution you can just set it to pixel height, width of your frames.
One remark: For better preprocessing outputs, I would recommend choosing a tighter crop around the head of the subject.
I updated the README accordingly and added some additional recommendations for using custom videos. Thanks!

@philgras Hi, I have some suggestions, if it need keep ratio, then just let users set ratio, rather than w,h, since we can calculate w,h from original frame. it was much more robust, also users can shrink down the ratio if GPU resource limited.

For the remark, this image is already output of first step, it actually does face alignment and keypoint detection, why not direcly crop it in first step so that users don't need manually or intentionly find some videos which head much be lower in frame?

Hi, thanks for the suggestion. I agree this makes it less cumbersome. I replaced the --tracking_resolution parameter with a --downscale_factor parameter which reduces the resolution by the given scalar factor. As I assume most people will use full-resolution, I made this one optional.

Regarding the remark: While face-alignment is very robust, the other preprocessing networks for face parsing and normal prediction heavily rely on suitable inputs. So for example, if I process a screenshot of your image posted here, the normal maps and facial parsings are quite noisy which will not provide a good optimization signal.
image
If you compare this to figure 3 in the paper or, for example, to the outputs using my github profile image, there is a significant difference in segmentation and normal quality.
image
To overcome this, it is best to record a video of the target person with good contrast and then choose a tight crop around the head which is ~ 512x512 or larger.

I can add an option that chooses this crop automatically based on the facial bounding boxes over the whole video - if you think this is helpful.

thank u for your detailed explaination. I will help test more video if I can make it fully run.