Speech-Driven Facial Animation with Spectral Gathering and Temporal Attention

Install dependencies

Necessary libraries:

# install python libs
$ python3 -m pip install -r requirements.txt
# install cmake and sndfile lib
$ sudo apt install libsndfile1 cmake

(not necessary) If you want to prepare dataset, montreal-forced-aligner must be installed. (Some errors may occur during installation, please pay attention.)

$ bash scripts/install_mkl.sh
$ bash scripts/install_kaldi.sh
$ bash scripts/install_mfa.sh

Evaluate

Download pretrained model from Google Drive, unzip it, and put in ./pretrained_models/dgrad.

Modify and run evaluate script bash evaluate.sh.

Prepare VOCASET

Download VOCASET from https://voca.is.tue.mpg.de/ Unzip directories:

| VOCASET
 -| unposedcleaneddata
 -| sentencestext
 -| templates
 -| audio

Run the preload python script.

python3 -m saberspeech.datasets.voca.preload\
    --source_root <ROOT_VOCASET> \
    --output_root <ROOT_PROCESSED>

Pre-trained models

dgrad
offsets
PCA of dgrad, offsets

Citation