train our own voice model

Question

train our own voice model

chandrakanthlns opened this issue 2 years ago · 4 comments

Hi ,

I have found your repo very interesting. So, I am trying out this. I am curious to know about training our voice files to creating checkpoint without involvement of text(As i have seen in previous issues to take reference of coqui model training) and without altering config.json. Can you please guide us how to proceed on this further.

Answer 1 · 2022-09-29T09:39:47.000Z

@Edresson sir, Thank you so much for your excellent work. very nice paper.

Multi-lingual Zero-shot pre-trained model, I tried with speaker adaptation (359 samples - 25 minutes). but it's not giving good results.

I tried Hindi, and Telugu Transcript to English Transliteration, and used as a transcript to fine-tune the pre-trained model.

The output is not good. May I know any reason for this? and How can we overcome this issue?

@Edresson Kindly request, Please give your suggestions.

Thanks

Answer 2 · 2022-09-29T18:39:25.000Z

Hey, speaker adaptation is to add speakers to existing supported languages. If you want to add langages, you'll have to add it in languages_ids.json and manually add one embedding to the language embedding on the model. We have sucessfully added languages with 1000~ samples.

Answer 3 · 2022-09-29T19:20:55.000Z

@WeberJulian Thank you so much.

Do i need touch speaker Encoder model to Finetune with 1000 samples to get better reference speaker quality?

Thanks

Answer 4 · 2022-09-30T12:37:25.000Z

No the speaker encoder is general enough :)