StyleTalk

The official repository of the AAAI2023 paper StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Paper | Supp. Materials | Video

The proposed StyleTalk can generate talking head videos with speaking styles specified by arbitrary style reference videos.

News

April 14th, 2023. The code is available.

Get Started

Installation

Clone this repo, install conda and run:

conda create -n styletalk python=3.7.0
conda activate styletalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg

The code has been test on CUDA 11.1, GPU RTX 3090.

Data Preprocessing

Our methods takes 3DMM parameters(*.mat) and phoneme labels(*_seq.json) as input. Follow PIRenderer to extract 3DMM parameters. Follow AVCT to extract phoneme labels. Some preprocessed data can be found in folder samples.

Inference

Download checkpoints for StyleTalk and Renderer and put them into ./checkpoints.

Run the demo:

python inference_for_demo.py \
--audio_path samples/source_video/phoneme/reagan_clip1_seq.json \
--style_clip_path samples/style_clips/3DMM/happyenglish_clip1.mat \
--pose_path samples/source_video/3DMM/reagan_clip1.mat \
--src_img_path samples/source_video/image/andrew_clip_1.png \
--wav_path samples/source_video/wav/reagan_clip1.wav \
--output_path demo.mp4

Change audio_path, style_clip_path, pose_path, src_img_path, wav_path, output_path to generate more results.

Acknowledgement

Some code are borrowed from following projects:

Thanks for their contributions!