UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model [ECCV2024]

Useful Links

[Homepage] [arXiv] [Video]

UniTalker generates realistic facial motion from different audio domains, including clean and noisy voices in various languages, text-to-speech-generated audios, and even noisy songs accompanied by back-ground music.

UniTalker can output multiple annotations.

For datasets with new annotations, one can simply plug new heads into UniTalker and train it with existing datasets or solely with new ones, avoiding retopology.

Installation

Environment

Linux
Python 3.10
Pytorch 2.2.0
CUDA 12.1
transformers 4.39.3
Pytorch3d 0.7.7 (Optional: just for rendering the results)

  conda create -n unitalker python==3.10
  conda activate unitalker
  conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
  pip install transformers librosa tensorboardX

Inference

Download checkpoints, PCA models and template resources

UniTalker-B-[D0-D7]: The base model in paper. Download it and place it in "./pretrained_models"

UniTalker-L-[D0-D7]: The default model in paper. Please first try the base model to run the pipeline through.

unitalker_data_release_V1: The released datasets, PCA models, data-split json files and id-template numpy array. Download and unzip it in this repo.

use "git lfs pull" to get "./resources.zip" and "./test_audios.zip" and unzip it in this repo

Finally, these files should be organized as follows:

├── pretrained_models
│   ├── UniTalker-B-D0-D7.pt
│   ├── UniTalker-L-D0-D7.pt
├── resources
│   ├── binary_resources
│   │   ├── 02_flame_mouth_idx.npy
│   │   ├── ...
│   │   └── vocaset_FDD_wo_eyes.npy
│   └── obj_template
│       ├── 3DETF_blendshape_weight.obj
│       ├── ...
│       └── meshtalk_6172_vertices.obj
├── unitalker_data_release_V1
│   ├── D0_BIWI
│   │   ├── id_template.npy
│   │   └── pca.npz
│   ├── D1_vocaset
│   │   ├── id_template.npy
│   │   └── pca.npz
│   ├── D2_meshtalk
│   │   ├── id_template.npy
│   │   └── pca.npz
│   ├── D3D4_3DETF
│   │   ├── D3_HDTF
│   │   └── D4_RAVDESS
│   ├── D5_unitalker_faceforensics++
│   │   ├── id_template.npy
│   │   ├── test
│   │   ├── test.json
│   │   ├── train
│   │   ├── train.json
│   │   ├── val
│   │   └── val.json
│   ├── D6_unitalker_Chinese_speech
│   │   ├── id_template.npy
│   │   ├── test
│   │   ├── test.json
│   │   ├── train
│   │   ├── train.json
│   │   ├── val
│   │   └── val.json
│   └── D7_unitalker_song
│       ├── id_template.npy
│       ├── test
│       ├── test.json
│       ├── train
│       ├── train.json
│       ├── val
│       └── val.json

Demo

  python -m main.demo --config config/unitalker.yaml test_out_path ./test_results/demo.npz
  python -m main.render ./test_results/demo.npz ./test_audios ./test_results/

Train

Download Data

unitalker_data_release_V1 contains D5, D6 and D7. The datasets have been processed and grouped into train, validation and test. Please use these three datasets to try the training step. If you want to train the model on the D0-D7, you need to download the datasets following these links: D0: BIWI. D1: VOCASET. D2: meshtalk. D4,D5: 3DETF.

Modify Config and Train

Please modify "dataset" in "config/unitalker.yaml" according to the datasets you have prepared.

python -m main.train --config config/unitalker.yaml

AI-Jie01/UniTalker