How to make coqui thorsten voice "more fluent"

Question

How to make coqui thorsten voice "more fluent"

Closed this issue a month ago · 2 comments

Hello,

Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".

See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html

How can we improve it?

Do we need to train for more steps?
Are there some specific parameters to tune?
Do we need to fine tune the model on "accelerated speech"?

Many thanks!

Answer 1 · 2024-03-19T10:14:42.000Z

Hi,
if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark.
I suggest to continue training at least up to 300k.

Answer 2 · 2024-03-20T14:30:02.000Z

Thank you!