Generate an FBX of a phoneme lip-sync animation from an sigle audio file, using Wav2Vec2 to analyze the phonemes for helps the animators starts with very basic animation.
Virtual environment, Python 3.7 is highly recommended, as it is supported by the FBX Python SDK.
git clone https://github.com/yamahigashi/Wav2Vec2FBX.git
cd Wav2Vec2FBX
pip install requirements.txt
https://www.autodesk.com/developer-network/platform-technologies/fbx-sdk-2020-0
download FBX SDK from autodesk and place libraries (fbx.pyd
, FbxCommon.py
and fbxsip.pyd
) into lib
folder.
python main.py input_audio.wav
This will generate input_audio.fbx
in the same folder as the input file.
The behaviour can be changed by the configuration file assets/config.toml
.
[keyframes]
# ipa と無口を補完するフレーム
interpolation = 5
# 複数口形素からなる ipa を補完するフレーム
consecutive_viseme_frame = 3
Describes settings for preprocessing an audio file. It splits the file based on the silence, and if it is still too long, splits the file based on the settings.
[audio_settings]
# 無音期間を判定する際の最小ミリセク (初期値 500)
min_silence_len_ms = 500
# 無音判定 (初期値 -36)
silence_thresh_db = -36
# 最長オーディオファイル。これ以上は複数に分割して処理 (初期値 5000)
maximum_duration_ms = 5000
The phonemes to morphemes correspondence table. The phonemes determined by Wav2Vec are mapped to oral morphemes. The list of morphonemes can be given as.
[ipa_to_arpabet]
'ɔ' = ["a"]
'ɑ' = ["a"]
'i' = ["i"]
# Long Vowels
'e ː' = ["e", "e"]
'o ː' = ["o", "o"]
# -------- snip --------------
You can deploy this package as binary for the environment without python using cx_Freeze
.
python setup.py build
This will generate binary for your platform.