Fine Tune for Voice Conversion?

Question

Fine Tune for Voice Conversion?

jsl303 opened this issue 2 years ago · 2 comments

I've tried the voice conversion by providing driving and target samples, but the result doesn't sound like the target at all. It's somewhat closer to driving sample.
Is there an instruction on how to fine tune the model to make the output sound better?

Answer 1 · 2022-07-22T03:27:20.000Z

same problem， the generated voice almost as same as the driving sample.
And i found that the code just fine-tuning vocoder(hifigan )

Answer 2 · 2022-12-12T17:26:59.000Z

The training procedure for voice conversion and TTS are equal. If you like you can follow the recipe that replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py