Edresson/YourTTS

Fine Tune for Voice Conversion?

jsl303 opened this issue · 2 comments

I've tried the voice conversion by providing driving and target samples, but the result doesn't sound like the target at all. It's somewhat closer to driving sample.
Is there an instruction on how to fine tune the model to make the output sound better?

same problem, the generated voice almost as same as the driving sample.
And i found that the code just fine-tuning vocoder(hifigan )

The training procedure for voice conversion and TTS are equal. If you like you can follow the recipe that replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py