-
Install Anaconda, Python and
git
. -
Creating the env and install the requirements.
git clone https://github.com/OpenTalker/SadTalker.git
cd SadTalker
conda create -n sadtalker python=3.8
conda activate sadtalker
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
### Coqui TTS is optional for gradio demo.
### pip install TTS
A video tutorial in chinese is available here. You can also follow the following instructions:
- Install Python 3.8 and check "Add Python to PATH".
- Install git manually or using Scoop:
scoop install git
. - Install
ffmpeg
, following this tutorial or using scoop:scoop install ffmpeg
. - Download the SadTalker repository by running
git clone https://github.com/Winfredy/SadTalker.git
. - Download the checkpoints and gfpgan models in the downloads section.
- Run
start.bat
from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.
A tutorial on installing SadTalker on macOS can be found here.
Please check out additional tutorials here.
You can run the following script on Linux/macOS to automatically download all the models:
bash scripts/download_models.sh
We also provide an offline patch (gfpgan/
), so no model will be downloaded when generating.
- Google Drive
- GitHub Releases
- Baidu (百度云盘) (Password:
sadt
)
- Google Drive
- GitHub Releases
- Baidu (百度云盘) (Password:
sadt
)
Model Details
Model explains:
Model | Description |
---|---|
checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render). |
checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render). |
gfpgan/weights | Face detection and enhanced models used in facexlib and gfpgan . |
Model | Description |
---|---|
checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |
checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |
checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from the reappearance of face-vid2vid. |
checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in Deep3DFaceReconstruction. |
checkpoints/wav2lip.pth | Highly accurate lip-sync model in Wav2lip. |
checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in dilb. |
checkpoints/BFM | 3DMM library file. |
checkpoints/hub | Face detection models used in face alignment. |
gfpgan/weights | Face detection and enhanced models used in facexlib and gfpgan . |
The final folder will be shown as:
Please read our document on best practices and configuration tips
Online Demo: HuggingFace | SDWebUI-Colab | Colab
Local WebUI extension: Please refer to WebUI docs.
Local gradio demo (recommanded): A Gradio instance similar to our Hugging Face demo can be run locally:
## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py
You can also start it more easily:
- windows: just double click
webui.bat
, the requirements will be installed automatically. - Linux/Mac OS: run
bash webui.sh
to start the webui.
python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--enhancer gfpgan
The results will be saved in results/$SOME_TIMESTAMP/*.mp4
.
Using --still
to generate a natural full body video. You can add enhancer
to improve the quality of the generated video.
python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--result_dir <a file to store results> \
--still \
--preprocess full \
--enhancer gfpgan
More examples and configuration and tips can be founded in the >>> best practice documents <<<.
We also use the following 3rd-party libraries:
- Face Utils: https://github.com/xinntao/facexlib
- Face Enhancement: https://github.com/TencentARC/GFPGAN
- Image/Video Enhancement:https://github.com/xinntao/Real-ESRGAN