/BAYC-Animated-BoredApes

Deep Learning Pipeline that animates Bored Ape NFTs using SpeakerAware animation technique

Primary LanguagePython

BAYC-Animated-BoredApes: Speaker-Aware Talking-Head Animation

This is the code repository implementing the paper:

MakeItTalk: Speaker-Aware Talking-Head Animation

Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria , Evangelos Kalogerakis, Dingzeyu Li

SIGGRAPH Asia 2020

Abstract We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.

[Project page] [Paper] [Video] [Arxiv] [Colab Demo] [Colab Demo TDLR]

image

Installation:

  1. Create environment and activate it.
conda create -n makeittalk_env python=3.6
conda activate makeittalk_env
  1. Install FFMPEG Tool
sudo apt-get install ffmpeg
  1. Install all the relevant packages.
pip install -r requirements.txt
  1. You don't need wine for this implementation. It's been removed.

Download the following pre-trained models to models/ folder for testing your own animation.

Model Link to the model
Voice Conversion Link
Speech Content Module Link
Speaker-aware Module Link
Image2Image Translation Module Link

Download pre-trained embedding [here] and save to models/dump folder.

Usage Details

Connect to the machine using Chrome Remote Desktop https://remotedesktop.google.com/access/ Follow all intructions to install and access your GCP machine using Chrome Remote Desktop

To produce samples:

  1. place the source files generated from Landmarking Tool into ./input/character_data/
  2. Remove all the audio files from ./input/audio/ and add only the one that's to be used.

In main.py change

  1. the char_name to name of the file without .jpg extension.
  2. Change image_input_dir
  3. audio_name = "audio-to-be-used" without extension

Run the following command in the root directory of the project.

python main.py

Alt

Samples

https://drive.google.com/drive/folders/1p9-LWWVvVxB31GEYU-9aQ89KYqOlaVbf?usp=sharing