train our own voice model
chandrakanthlns opened this issue · 4 comments
Hi ,
I have found your repo very interesting. So, I am trying out this. I am curious to know about training our voice files to creating checkpoint without involvement of text(As i have seen in previous issues to take reference of coqui model training) and without altering config.json. Can you please guide us how to proceed on this further.
@Edresson sir, Thank you so much for your excellent work. very nice paper.
Multi-lingual Zero-shot pre-trained model, I tried with speaker adaptation (359 samples - 25 minutes). but it's not giving good results.
I tried Hindi, and Telugu Transcript to English Transliteration, and used as a transcript to fine-tune the pre-trained model.
The output is not good. May I know any reason for this? and How can we overcome this issue?
@Edresson Kindly request, Please give your suggestions.
Thanks
Hey, speaker adaptation is to add speakers to existing supported languages. If you want to add langages, you'll have to add it in languages_ids.json and manually add one embedding to the language embedding on the model. We have sucessfully added languages with 1000~ samples.
@WeberJulian Thank you so much.
Do i need touch speaker Encoder model to Finetune with 1000 samples to get better reference speaker quality?
Thanks
No the speaker encoder is general enough :)