face-mesh-to-blendshapes: A JavaScript repository from Haiba Labs

Author: James Ritts james@haibalabs.com

This notebook (mediapipe_face_mesh_to_blendshapes.ipynb) trains a simple pytorch model to map from MediaPipe face mesh landmarks to ARKit-compatible blendshapes.

Click here to open the demo.

Notes

We wish to train on object space geo so it doesn't have to learn what a face pose looks like in every possible head orientation. Unfortunately MediaPipe's output is given in a coordinate system that makes this difficult. Its mesh is also stretched to conform to the silhouette of the face in the input image. The function normalize_landmarks() tries to undo these effects: the mesh is segmented into mouth, left- and right-eye patches; then a basis built for each patch from selected quads in order to reorient the patch towards the camera; then the verts are projected to the XY plane and their components rescaled to [0, 1] for model input.
The function convert_landmarks_to_model_input() uses normalize_landmarks in order to convert from raw MediaPipe output to the NN input vector. This function needs to be ported to any environment where the model is run.
MediaPipe isn't able to signal every blendshape. These should be forced to zero at runtime and possibly others as well: jawForward, jawRight, jawLeft, mouthDimpleRight, mouthDimpleLeft, cheekPuff, tongueOut.

Format

The order of blendshape values in the model output is:

eyeBlinkRight, eyeLookDownRight, eyeLookInRight, eyeLookOutRight, eyeLookUpRight, eyeSquintRight, eyeWideRight, eyeBlinkLeft, eyeLookDownLeft, eyeLookInLeft, eyeLookOutLeft, eyeLookUpLeft, eyeSquintLeft, eyeWideLeft, jawForward, jawRight, jawLeft, jawOpen, mouthClose, mouthFunnel, mouthPucker, mouthRight, mouthLeft, mouthSmileRight, mouthSmileLeft, mouthFrownRight, mouthFrownLeft, mouthDimpleRight, mouthDimpleLeft, mouthStretchRight, mouthStretchLeft, mouthRollLower, mouthRollUpper, mouthShrugLower, mouthShrugUpper, mouthPressRight, mouthPressLeft, mouthLowerDownRight, mouthLowerDownLeft, mouthUpperUpRight, mouthUpperUpLeft, browDownRight, browDownLeft, browInnerUp, browOuterUpRight, browOuterUpLeft, cheekPuff, cheekSquintRight, cheekSquintLeft, noseSneerRight, noseSneerLeft, tongueOut

Training data has this folder structure:

my_first_dataset
- neutral.jpg
- my_first_dataset.csv
- my_first_dataset_000000.jpg
- my_first_dataset_000001.jpg
- my_first_dataset_000002.jpg
- ...
my_second_dataset
sets.txt

The file sets.txt should contain the folder names of all training datasets:

my_first_dataset
my_second_dataset
...

Each dataset must have a calibration photo depicting a neutral facial expression: neutral.jpg. The model is trained on object space offsets from the neutral pose.

Each dataset also has a CSV file containing a header row followed by labels (blendshape values) for each input image:

eyeBlinkRight,eyeLookDownRight,eyeLookInRight,eyeLookOutRight,eyeLookUpRight,eyeSquintRight,eyeWideRight,eyeBlinkLeft,eyeLookDownLeft,eyeLookInLeft,eyeLookOutLeft,eyeLookUpLeft,eyeSquintLeft,eyeWideLeft,jawForward,jawRight,jawLeft,jawOpen,mouthClose,mouthFunnel,mouthPucker,mouthRight,mouthLeft,mouthSmileRight,mouthSmileLeft,mouthFrownRight,mouthFrownLeft,mouthDimpleRight,mouthDimpleLeft,mouthStretchRight,mouthStretchLeft,mouthRollLower,mouthRollUpper,mouthShrugLower,mouthShrugUpper,mouthPressRight,mouthPressLeft,mouthLowerDownRight,mouthLowerDownLeft,mouthUpperUpRight,mouthUpperUpLeft,browDownRight,browDownLeft,browInnerUp,browOuterUpRight,browOuterUpLeft,cheekPuff,cheekSquintRight,cheekSquintLeft,noseSneerRight,noseSneerLeft,tongueOut
0.039,0.103,0.044,0.000,0.000,0.000,0.000,0.039,0.104,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.010,0.010,0.027,0.000,0.000,0.002,0.003,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.015,0.014,0.000,0.000,0.000,0.007,0.000,0.000,0.000,0.000,0.000
0.038,0.091,0.049,0.000,0.000,0.000,0.000,0.038,0.092,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.010,0.011,0.027,0.000,0.000,0.002,0.004,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.014,0.014,0.000,0.000,0.000,0.007,0.000,0.000,0.000,0.000,0.000
...

Relevant links:

To do:

find ways to reduce head rotation-driven blendshape error
cull shapes from NN output and training with which MP points don't correlate
cull training examples which don't signal shapes that MP can detect

haibalabs/face-mesh-to-blendshapes

Click here to open the demo.

Notes

Format

Relevant links:

To do: