joonson/syncnet_python

The input is face image or lip image?

Opened this issue · 1 comments

In this paper, it's said that the input is lip image. But in this repo and the example.avi, the whole faces are kept and processed without cropping face part. In your Keras version, you only use lip.
So for this pretrained model syncnet_v2.model, what kind of input image should we use?

#9
It seems to be full face, can't understand why it is inconsistent with the paper, but there isn't any explanation.