Fine tune YourTTS on own audio
karynaur opened this issue · 11 comments
Hey thanks for the incredible work! Is there a way to fine tune the model on custom audio and compare the results to zeroshot?
Hi Aditya,
Thanks :)
The fine-tuning is equal to a normal training using --restore_path in Coqui TTS (we only guarantee that 1/4 of the batch is composed by the adapted speaker voice). We are thinking of making friendly guidelines available. However, due to the ethical implications (Fake news/Deep Fake), we are thinking a lot about it. I will keep you posted if we make available friendly guidelines.
Dear Edresson, Could you elaborate on the steps for fine tuning for your voice?у
I'm also interested to know the finetuning steps
Would be great to get some guidance in finetuning
im also curious about the steps
Hi Aditya,
Thanks :)
The fine-tuning is equal to a normal training using --restore_path in Coqui TTS (we only guarantee that 1/4 of the batch is composed by the adapted speaker voice). We are thinking of making friendly guidelines available. However, due to the ethical implications (Fake news/Deep Fake), we are thinking a lot about it. I will keep you posted if we make available friendly guidelines.
I believe it was a mistake when OpenAI did things like that. It will only allow for a brief period of time in which a smaller group of people (Some with nefarious intent) figure it out and don't share with the rest of the world. Anyway, if only 1/4 of the batch is composed by the adapted speaker and 4/4 is required for more accurate results, isn't it relatively trivial to modify to this end? I haven't looked at the code yet but it doesn't sound too tough on its face.
Is accuracy being curtailed in this way in the web demo? I'm curious whether or not the demo is demonstrating the full potential of the model.
Will there be any steps on how to actually fine tune the model?
My voice is very deep and when I tried training it it didn't come close to my voice sadly. I was led here, but I don't see any steps on how to fine tune a model
Hi @Edresson
In order to finetune with custom voice, do we also need transcriptions for those new custom voices?
@TejaswiniiB @etmyhome @mandray @leminhyen2 @e0xextazy @karynaur
Recently, we created a recipe that makes everything easier. If you like you can try to fine-tune the model with this recipe. The recipe replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py
@Edresson thank you for making the recipe public! I haven't tested the script yet but I really appreciated your team's decision to open-source this. All the best to Coqui team :)