Lip2Speech [PDF]
A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content.
Video Input | Processed Input | Speech Output |
---|---|---|
Alignment Plot | Melspectogram Output |
---|---|
The pretrained model is available here [265.12 MB]
Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.
python3 demo.py
- dataset: LRW (10 Samples)
- root: Datasets/SAMPLE_LRW
- model_path: savedmodels/lip2speech_final.pth
- encoding: voice
Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)
python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth
To train the model, we run train.py
python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth
- finetune_model_path - Use as base model to finetune to dataset. (optional)