Audio2Face: A Python repository from app-johndpope

Audio to Face Blendshape

Implementation with PyTorch.

复现人：刘宇昂

Base model
- LSTM using MFCC audio features
- CNN(ref simplified version) with LPC features

Prerequisites

Python3
PyTorch v0.3.0
numpy
librosa & audiolazy
scipy
etc.

Files

Scripts to run
- main.py: change net name and set checkpoints folder to train different models
- test_model.py: generate blendshape sequences given extracted audio features (need audio features as input)
- synthesis.py: generate blendshape directly from input wav (need arguements of input audio path)
Classes
- models.py: Classes with LSTM and CNN (simplified NvidiaNet) model.
- models_testae.py: Advanced models with audoencoder design.
- dataset.py: Class for loading dataset.
Input preprocessing
- misc/audio_mfcc.py: extract mfcc features from input wav files
- misc/audio_lpc.py: extract lpc features
- misc/combine.py: combine certain audio feature/blendshape files to obtain a single file for data loading

Usage

Input

To build your own dataset, you need to preprocess your wav/blendshape pairs with misc/audio_mfcc.py or misc/audio_lpc.py. Then combine those feature/blendshape files misc/combine.py to a single feature/blendshape file.

Training

Modify main.py. Set model to the one you need and also specify checkpoint folder.

Evaluation

Both test_model.py and synthesis.py can be used to generate blendshape sequences.
- test_model.py accepts extrated audio features (MFCC/LPC).
- synthesis.py takes raw wav file as input
- State the arguments and it will produce a blenshape test file.