AI-Avatar-RealTime: A Jupyter Notebook repository from PrashantDixit0

Documentation

1. Installation.

Linux/Unix

Install Anaconda, Python and git.
Creating the env and install the requirements.

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### Coqui TTS is optional for gradio demo. 
### pip install TTS

Windows

A video tutorial in chinese is available here. You can also follow the following instructions:

Install Python 3.8 and check "Add Python to PATH".
Install git manually or using Scoop: scoop install git.
Install ffmpeg, following this tutorial or using scoop: scoop install ffmpeg.
Download the SadTalker repository by running git clone https://github.com/Winfredy/SadTalker.git.
Download the checkpoints and gfpgan models in the downloads section.
Run start.bat from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.

macOS

A tutorial on installing SadTalker on macOS can be found here.

Docker, WSL, etc

Please check out additional tutorials here.

2. Download Models

You can run the following script on Linux/macOS to automatically download all the models:

bash scripts/download_models.sh

We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.

Pre-Trained Models

GFPGAN Offline Patch

Model Details

Model explains:

New version

Model	Description
checkpoints/mapping_00229-model.pth.tar	Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar	Pre-trained MappingNet in Sadtalker.
checkpoints/SadTalker_V0.0.2_256.safetensors	packaged sadtalker checkpoints of old version, 256 face render).
checkpoints/SadTalker_V0.0.2_512.safetensors	packaged sadtalker checkpoints of old version, 512 face render).
gfpgan/weights	Face detection and enhanced models used in `facexlib` and `gfpgan`.

Old version

Model	Description
checkpoints/auido2exp_00300-model.pth	Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth	Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar	Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar	Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar	Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth	Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth	Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat	Face landmark model used in dilb.
checkpoints/BFM	3DMM library file.
checkpoints/hub	Face detection models used in face alignment.
gfpgan/weights	Face detection and enhanced models used in `facexlib` and `gfpgan`.

The final folder will be shown as:

3. Quick Start

Please read our document on best practices and configuration tips

WebUI Demos

Online Demo: HuggingFace | SDWebUI-Colab | Colab

Local WebUI extension: Please refer to WebUI docs.

Local gradio demo (recommanded): A Gradio instance similar to our Hugging Face demo can be run locally:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py

You can also start it more easily:

windows: just double click webui.bat, the requirements will be installed automatically.
Linux/Mac OS: run bash webui.sh to start the webui.

CLI usage

Animating a portrait image from default config:

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --enhancer gfpgan

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

Full body/image Generation:

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan

More examples and configuration and tips can be founded in the >>> best practice documents <<<.

Citations

We also use the following 3rd-party libraries:

Face Utils: https://github.com/xinntao/facexlib
Face Enhancement: https://github.com/TencentARC/GFPGAN
Image/Video Enhancement:https://github.com/xinntao/Real-ESRGAN

PrashantDixit0/AI-Avatar-RealTime