The official repository of the AAAI2023 paper StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles
Paper | Supp. Materials | Video
The proposed StyleTalk can generate talking head videos with speaking styles specified by arbitrary style reference videos.
- April 14th, 2023. The code is available.
Clone this repo, install conda and run:
conda create -n styletalk python=3.7.0
conda activate styletalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg
The code has been test on CUDA 11.1, GPU RTX 3090.
Our methods takes 3DMM parameters(*.mat) and phoneme labels(*_seq.json) as input. Follow PIRenderer to extract 3DMM parameters. Follow AVCT to extract phoneme labels. Some preprocessed data can be found in folder samples
.
Download checkpoints for StyleTalk and Renderer and put them into ./checkpoints
.
Run the demo:
python inference_for_demo.py \
--audio_path samples/source_video/phoneme/reagan_clip1_seq.json \
--style_clip_path samples/style_clips/3DMM/happyenglish_clip1.mat \
--pose_path samples/source_video/3DMM/reagan_clip1.mat \
--src_img_path samples/source_video/image/andrew_clip_1.png \
--wav_path samples/source_video/wav/reagan_clip1.wav \
--output_path demo.mp4
Change audio_path
, style_clip_path
, pose_path
, src_img_path
, wav_path
, output_path
to generate more results.
Some code are borrowed from following projects:
Thanks for their contributions!