Lip2Speech [PDF]

A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content.

Video Input	Processed Input	Speech Output

Architecture Overview

Alignment Plot	Melspectogram Output

The pretrained model is available here [265.12 MB]

Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.

python3 demo.py

Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)

python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth

To train the model, we run train.py

python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth