This is the code repository implementing the paper:
MakeItTalk: Speaker-Aware Talking-Head Animation
Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria , Evangelos Kalogerakis, Dingzeyu Li
SIGGRAPH Asia 2020
Abstract We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.
[Project page] [Paper] [Video] [Arxiv] [Colab Demo] [Colab Demo TDLR]
- Create environment and activate it.
conda create -n makeittalk_env python=3.6
conda activate makeittalk_env
- Install FFMPEG Tool
sudo apt-get install ffmpeg
- Install all the relevant packages.
pip install -r requirements.txt
- You don't need wine for this implementation. It's been removed.
Download the following pre-trained models to models/
folder for testing your own animation.
Model | Link to the model |
---|---|
Voice Conversion | Link |
Speech Content Module | Link |
Speaker-aware Module | Link |
Image2Image Translation Module | Link |
Download pre-trained embedding [here] and save to models/dump
folder.
Connect to the machine using Chrome Remote Desktop https://remotedesktop.google.com/access/ Follow all intructions to install and access your GCP machine using Chrome Remote Desktop
To produce samples:
- place the source files generated from Landmarking Tool into
./input/character_data/
- Remove all the audio files from
./input/audio/
and add only the one that's to be used.
In main.py
change
- the char_name to name of the file without .jpg extension.
- Change image_input_dir
- audio_name = "audio-to-be-used" without extension
Run the following command in the root directory of the project.
python main.py
https://drive.google.com/drive/folders/1p9-LWWVvVxB31GEYU-9aQ89KYqOlaVbf?usp=sharing