Wav2Lip-HQ: A Python repository from hupu1dong

High Quality Wav2Lip

Project description will be added
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Getting Started
- Installation
Usage
TDL
Contributing
Contact
Acknowledgments

About The Project

High-quality Wav2Lip, which can be trained on arbitrary datasets. As long as the training and inference scripts, the scripts for the required preprocessings are provided.

Preprocessing

Audio

Convert the audio sampling rate to 16000 Hz.
Compute and save the mel-spectrogram for each audio.

Video

Convert the video frame rate to 25 fps.
Extract and save raw frames(no face detection) from each video.
Compute the offset between each audio and video pair by using the pretrained SyncNet. The offset values are needed for the sync-correction of the dataset.
[Not provided] Estimate the face bounding box. Crop and save the bounding box region for each frame.
- Recommendation. Use any high performance face detection tool rather than s3fd(the one used in here). I used InsightFace.

Changes from the official implementation

Dataset

Any datasets are available.

GPU usage

Multi-GPU training is supported.
To avoid bottleneck, mel-spectrograms are computed and saved as .npy files beforehand. (Previously, STFT is computed everytime when the __getitem__ function is called)

Model Architecture

The FaceEncoder of SyncNet takes 48 x 48 lip region image, rather than 48 x 96 lower half image. (conditioned by the tighter_box option)

(back to top)

Getting Started

Installation

pip install -r requirements.txt

(back to top)

Usage

Preprocess

To begin with, the audio files are resampled with the sampling rate of 16000Hz. Also, STFT is applied to the resampled audio signals to obtain corresponding mel-spectrograms.

cd scripts/preprocess
python process_audio.py

Since the video files downloaded from YouTube have different frame rates(FPS), we should equalize this rate. The terminal command ffmpeg is used for frame rate conversion. The video length remains the same after conversion, so the audio doesn't have to be modified.

python process_video.py

Sync-correction (using the official SyncNet)

The official SyncNet implementation and its pretrained checkpoint are used for sync-correction. All the dependencies should be installed before moving on to the next step.

git clone https://github.com/joonson/syncnet_python.git
cd syncnet_python

Two python files(get_offset.py and newSyncNetInstance.py) in the scripts/preprocess/sync-correction directory need to be located in the syncnet_python directory. The shift value that minimizes syncnet loss is selected as offset. The offset value obtained for each video is recorded in the output/offset.csv file. If the input videos are not separated into frame images, adding --separate_frames option at the end of the line will help you.

python get_offset.py # --separate_frames

Train

git clone https://github.com/yukyeongleee/Wav2Lip-HQ.git
cd Wav2Lip-HQ
python scripts/train_syncnet.py {run_id} # SyncNet training
python scripts/train_wav2lip.py {run_id} # Wav2Lip training

(back to top)

TDL

Add dataset preprocessing scripts
Add sync-correction scripts

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

hupu1dong/Wav2Lip-HQ

High Quality Wav2Lip

About The Project

Preprocessing

Audio

Video

Changes from the official implementation

Dataset

GPU usage

Model Architecture

Getting Started

Installation

Usage

Preprocess

Sync-correction (using the official SyncNet)

Train

TDL

Contributing

Contact

Acknowledgments