PyTorch implementation for the paper EmoFace: Audio-driven Emotional 3D Face Animation.
- Linux
- Python 3.7.0
- Pytorch 1.9.0
- CUDA 11.3
As we could not publish the full dataset yet, we provide one piece of data from the evaluation set.
- ACTOR: Original performance video of the actor.
- VALID_CTR: Extracted controller value sequence.
- VIDEO: MetaHuman Animation generated by VALID_CTR
- WAV: Audio clip of the corresponding video.
First, dataloader for training and testing need to be generated by data.py. The path to the dataset needs to be assigned.
Training and Testing is combined in main.py. To run the model with default settings, there is only need to set maximum epoches.
python main.py --max_epoch 1000
When training, the weight would be saved in directory weight/.
The directory blink contains files related to blinks.
- data_collect.py: Collect EAR sequence and labels for training SVM prediction model.
- train_svm.py: Train SVM predictor with collected data.
- test_svm.py: Test the trained SVM predictor on other videos.
- blink_detect_threshold.py: Traditional way of predicting blinks by setting thresholds.
- blink_detect_svm.py: Use the trained SVM model to predict blinks.
- data_analysis.py: Fit the blink intervals to a log-norm curve.
demo.py uses the trained model to output corresponding controller rig values for audio clips. The weight of model PATH, path to audio files audio_path and save files pred_path need to be assigned inside the script.
Output of the model is .txt files containing controller values, each row stands for one frame. To visualize the output, you need a MetaHuman model with all the controller rigs in the valid_attr_names.txt.