This project relies on SadTalker to implement Wav2lip for lip video synthesis. Using video files to generate voice-controlled lip shapes and setting a custom enhancement method on the facial region, image enhancement is performed on the synthetic lip shape region (face) to improve the clarity of the generated lip shapes.
Use DAIN's DL frame interpolation algorithm to add frames to the generated video and complete the transition action of synthetic lip shapes between frames, making the synthetic lip shapes smoother, more realistic and natural.
In addition, XTTS was implemented on the Colab notebook, so in fact at the moment it is an open source analogue of HeyGen. In the future I will implement a beautiful Gradio website.
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
#If you need to use the DAIN model for frame filling, you need to install it. paddle
# CUDA 11.2
python -m pip install paddlepaddle-gpu==2.3.2.post112 \
-f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
SadTalker-Video-Lip-Sync
├──checkpoints
| ├──BFM_Fitting
| ├──DAIN_weight
| ├──hub
| ├── ...
├──dian_output
| ├── ...
├──examples
| ├── audio
| ├── video
├──results
| ├── ...
├──src
| ├── ...
├──sync_show
├──third_part
| ├── ...
├──...
├──inference.py
├──README.md
python inference.py --driven_audio <audio.wav> \
--source_video <video.mp4> \
--enhancer <none,lip,face> \ #(默认lip)
--use_DAIN \ #(Using this function will occupy a large amount of video memory and consume a lot of time.)
--time_step 0.5 #(Insertion frame frequency,Default 0.5,25fps—>50fps; 0.25, 25fps—>100fps)
#The composite effect is shown in ./sync_show directory:
#original.mp4 original video
#sync_none.mp4 No enhanced synthesis effects
#none_dain_50fps.mp4 Add frames from 25fps to 50fps using DAIN model only
#lip_dain_50fps.mp4 Enhance the lip area to make the lip shape clearer + DAIN model adds frames from 25fps to 50fps
#face_dain_50fps.mp4 Enhance the entire face area to make the lip shape clearer + DAIN model adds frames from 25fps to 50fps
#The following is a video of the generation effects of different methods
#our.mp4 Video generated by SadTalker-Video-Lip-Sync in this project
#sadtalker.mp4 full video generated by sadtalker
#retalking.mp4 Video generated by retalking
#wav2lip.mp4 Video generated by wav2lip
lip_sync.mp4
When the videos are spliced together, the frame number is unified to 25fps. The effect of interpolating frames cannot be seen. For specific details, you can see the individual videos in the ./sync_show directory for comparison.
Comparison of the effects of this project with sadtalker, retalking, and wav2lip lip synthesis:
our | sadtalker |
---|---|
our_sync.mp4 |
sadtalker_sync.mp4 |
retalking | wav2lip |
retalking_sync.mp4 |
wa2lip_sync.mp4 |
The video displayed in the readme has been resized. The original video can be compared with the videos synthesized from different categories in the ./sync_show directory.
The pre-trained model is shown below:
├──checkpoints
| ├──BFM_Fitting
| ├──DAIN_weight
| ├──hub
| ├──auido2exp_00300-model.pth
| ├──auido2pose_00140-model.pth
| ├──epoch_20.pth
| ├──facevid2vid_00189-model.pth.tar
| ├──GFPGANv1.3.pth
| ├──GPEN-BFR-512.pth
| ├──mapping_00109-model.pth.tar
| ├──ParseNet-latest.pth
| ├──RetinaFace-R50.pth
| ├──shape_predictor_68_face_landmarks.dat
| ├──wav2lip.pth
Pre-trained model checkpoints download path:
Baidu Netdisk: https://pan.baidu.com/s/15-zjk64SGQnRT9qIduTe2A Extraction code: klfv
Google Cloud Drive: https://drive.google.com/file/d/1lW4mf5YNtS4MAD7ZkAauDDWp2N3_Qzs7/view?usp=sharing
Quark network disk: https://pan.quark.cn/s/2a1042b1d046 Extraction code: zMBP
#Download the compressed package and extract it to the project path (need to be executed when downloading Google Cloud Disk and Quark Cloud Disk)
cd SadTalker-Video-Lip-Sync
tar -zxvf checkpoints.tar.gz
- SadTalker:https://github.com/Winfredy/SadTalker
- VideoReTalking:https://github.com/vinthony/video-retalking
- DAIN :https://arxiv.org/abs/1904.00830
- PaddleGAN:https://github.com/PaddlePaddle/PaddleGAN