4DFM

4D Facial Expression Diffusion Model

1. Dataset

We test our method on two commonly used facial expression datasets, CoMA and BU-4DFE.

2. Model architecture

3. Video results

3.1 Label control

We perform a conditional generation according to the expression label y.

Examples

3.2 Text control

We perform a conditional generation according to a text. Note that the input texts “disgust high smile” and “angry mouth down” are the combinations of two terms used for training. For instance, “disgust high smile” is a new description that hasn’t been seen before, which combines “disgust” and “high smile”.

Text to expression examples:

3.3 Sequence filling

Similarly to inpainting whose purpose is to predict missing pixels of an image using a mask region as a condition, this task aims to predict missing frames of a temporal sequence by leveraging known frames as a condition.

Filling from the beginning.

Filling in the middle.

Filling from the end.

3.4 Diversity

The specific aim of the 3D facial animation generation is to learn a model that can generate facial expressions that are realistic, appearance- preserving, rich in diversity, with various ways to condition it.

Diversity of label control

The diversity of the generated sequences in terms of expression is shown hereafter. The meshes are obtained by retargeting the expression of the generated 𝑥0 on the same neutral faces.

mouth side

mouth up

Diversity of Geometry-adaptive generation

In the Geometry-adaptive generation task, we generate a facial expression from a given facial anatomy. This task can also be guided by a classifier. In order to benefit from the consistent and quality expressions adapted to the facial morphology by the DDPM, one can extract a landmark set 𝐿 from a mesh 𝑀, perform the geometry-adaptive task on it to generate a sequence involving 𝐿, and retarget it to 𝑀 by the landmark-guided mesh deformation. We show hereafter the diversity of the generated sequences.

eyebrow