lucoiso/UEAzSpeech

Mismatch between viseme and audio data

Opened this issue · 0 comments

Sometimes, when TTS is working, it needs to consume twice as much time as usual, but the generated WAVESOUND duration is correct, which leads to a doubling of the entire VISEME timeline length, but the audio is normal. Therefore, there may be a mismatch between VISEME data and audio. What is the reason for this?