uLipSync is an Unity asset to do a realtime lipsync.
- Fast calculation using Job and Burst compiler
- No native plugin / No dependency (only official packages)
- I've created this asset using Unity 2020.1.17f1 on Windows 10 (not tested with other versions and operation systems yet)
- Burst and Mathematics should be installed in the Package Manager
- Unity Package
- Download the latest .unitypackage from Release page.
- Git URL (UPM)
- Add
https://github.com/hecomi/uLipSync.git#upm
to Package Manager.
- Add
- Scoped Registry (UPM)
- Add a scoped registry to your project.
- URL:
https://registry.npmjs.com
- Scope:
com.hecomi
- URL:
- Install uLipSync in Package Manager.
- Add a scoped registry to your project.
- Attach
uLipSync
component to the GameObject that hasAudioSource
and plays voice sounds.
- Select the Profile (how to create and calibrate a Profile is described later).
- Attach
uLipSyncBlendshape
component to the GameObject of your character and select targetSkinnedMeshRenderer
.
- Click on the "Add New BlendShape" button to link the Phoneme and BlendShape corresponding to the recognition result.
- Register
uLipSyncBlendshape.OnLipSyncUpdate
touLipSync
in Callback section.
- If you want to use mic input, please attach
uLipSyncMicrophone
.
- Play!
This is the core component for calculating lip-sync. uLipSync
gets the audio buffer from MonoBehaviour.OnAudioFilterRead()
, so it needs to be attached to the same GameObject that plays the audio in AudioSource
. This computation is done in a background thread and is optimized by JobSystem and Burst Compiler.
This component is used to control the blendshape of the SkinndeMeshRenderer
. By registering a blendshape that corresponds to the Phoneme registered in the uLipSync
profile, the results of speech analysis will be reflected in the shape of the mouth.
Create an AudioClip
to play the microphone input and set it to AudioSource
. Attach this component to a GameObject that has uLipSync
. You can start/stop recording by calling StartRecord()
/ StopRecord()
from the script. You can also change the input source by changing the index
. To find the input you want to use, you can use uLipSync.MicUtil.GetDeviceList()
.
uLipSync
extracts voice features called MFCCs from the currently playing sound in real time. This is one of the features that were used for speech recognition before the advent of deep learning. The uLipSync
estimates which MFCCs of each phoneme the currently playing sound is close to, and issues a callback with the relevant information. The uLipSyncBlendShape
uses the callback to smoothly move the blendshape of the SkinnedMeshRenderer
.
An asset called Profile
is used to register this MFCC and related calculation parameters.
- MFCC
- Press
Add Phoneme
to register a new phoneme. By registering the name of the phoneme (e.g. A, I, E, O, U) and the corresponding MFCC according to the calibration method described below, you can estimate the phoneme.
- Press
- Parameters
- Mfcc Data Count
- The number of MFCC data to be registered.
- Mel Filter Bank Channels
- The number of Mel Filter Bank channels needed in the process of calculating MFCCs.
- Target Sample Rate
- The frequency at which the input data is downsampled to lighten the calculation. For example, by default, 48000 Hz data is input to
OnAudioFilterRead()
, but by default it is downsampled by 1/3 to 16000 Hz.
- The frequency at which the input data is downsampled to lighten the calculation. For example, by default, 48000 Hz data is input to
- Sample Count
- The number of buffers of sound needed for MFCC calculations. The default is 1024 samples at 16000 Hz, so about 0.064 seconds (~4 frames) of data is used (the calculation itself is done every frame, overlapping).
- Min Volume
- This is the minimum value of the input volume (Log10 applied, 0.001 would be -3).
- Max Volume
- The maximum value of the input volume. The normalized volume in combination with
Min Volume
will be output as the volume for the callback (OnLipSyncUpdate()
).
- The maximum value of the input volume. The normalized volume in combination with
- Mfcc Data Count
- Import / Export JSON
- You can output
Profile
to JSON and vice versa, or import it. See below for details.
- You can output
The registered callback will be issued at the time of Update()
after the lip-sync calculation is finished. The LipSyncInfo
structure passed as an argument looks like the following.
public struct LipSyncInfo
{
public int index;
public string phenome;
public float volume;
public float rawVolume;
public float distance;
}
index
- Index of the recognized MFCC
phenome
- Phoneme of the recognized MFCC (registered string)
volume
- The volume normalized by
Min Volume
andMax Volume
- The volume normalized by
rawVolume
- Volume before normalization
distance
- Error between the current input and the registered MFCC (the larger the error, the lower the confidence)
An example code is as follows:
using UnityEngine;
using uLipSync;
public class DebugPrintLipSyncInfo : MonoBehaviour
{
public void OnLipSyncUpdate(LipSyncInfo info)
{
Debug.LogFormat(
$"PHENOME: {info.phenome}, " +
$"VOL: {info.volume}, " +
$"DIST: {info.distance} ");
}
}
- Output Sound Gain
- Set it to 0 if you want lip-sync but don't want to output audio because if you set the volume of
AudioSource
to 0, the recognition itself will not be done.
- Set it to 0 if you want lip-sync but don't want to output audio because if you set the volume of
- Volume
- Current Volume
- Current raw volume
- Min Volume
- Minimum volume entered since startup to date.
- Max Volume
- Maximum volume entered since startup.
- Normalized Volume
- Volume normalized by the
Min Volume
andMax Volume
registered inProfile
.
- Volume normalized by the
- Current Volume
- MFCC
- The MFCC for the current input, displayed in real time.
Opening this FoldOut will affect the performance of the game as it will cause the editor to draw every frame.
First, click the Create button under Profile to create a new profile. The created profile asset will be automatically registered to the profile of uLipSync
.
Next, click the Add Phenome button to add a phoneme, such as A, I, E, O, U.
Then start the game, and hold down the Calib button in A when it is talking "aaaaaaa". Likewise, continue to speak "iiiiiii" and press and hold the Calib button in I. Calibrate all the phonemes like this to register the MFCCs.
Please prepare an AudioClip that is compatible with each Phoneme. While one of them is playing, press the corresponding Calib button as described above to reflect the result of the sound analysis into the profile.
You can also send calibration requests from scripts by calling the uLipSync.RequestCalibration(int index)
. The sample CalibrationByKeyboardInput.cs
shows how to calibrate with numeric keys like this:
lipSync = GetComponent<uLipSync>();
for (int i = 0; i < lipSync.profile.mfccs.Count; ++i)
{
var key = (KeyCode)((int)(KeyCode.Alpha1) + i);
if (Input.GetKey(key)) lipSync.RequestCalibration(i);
}
Since Profile is a ScriptableObject, changes are not saved in a build. Instead, it is possible to export and import the profile in Json format. From the script, do the following:
var lipSync = GetComponent<uLipSync>();
var profile = lipSync.profile;
// Export
profile.Export(path);
// Import
profile.Import(path);
Examples include Unity-chan assets.
© Unity Technologies Japan/UCL