erew123/alltalk_tts

How to fix robotic/metallic voice when adding new voices

Closed this issue · 1 comments

I installed AllTalk as part of the text-generation-webui and while using the standard voices, it works fine.
The problem emerges if I try to add new files to the "voices" folder. I got some .mp3 sample voices and tried to convert to .wav, and placing them inside the "voices" folder, however they are really metallic for some reason.
The voice samples don't have bacground noise, and are greater than 1 minute.
As there are some parameters while converting the files, I don't know if I'm not choosing the right paramenters like "mono" or "stereo", sample rate (22050Hz, 44100Hz, 352800Hz, etc...) and Encoding (Signed 16-bit PCM, Signed 32-bit PCM, 64-bit float, etc...).
Any suggestions on how to fix this robotic voice issue?

Hi @guispfilho

Please go to the settings and documentation page:

image

and then to the section on using voice samples:

image

Beyond that, many things can depend on the voice sample you are using. The XTTS model is trying to reproduce the sound of the voice you have given it. Though, if that wavers away from being a "normal" human voice e.g. lets say a cartoon character, it may be that its having difficulty reproducing it. Typically if your samples are good and setup correctly, then it may be you can try Finetuning to improve the audio reproduction.

Thanks