thorstenMueller/Thorsten-Voice

How to make coqui thorsten voice "more fluent"

Closed this issue · 2 comments

Hello,

Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".

See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html

How can we improve it?

  • Do we need to train for more steps?
  • Are there some specific parameters to tune?
  • Do we need to fine tune the model on "accelerated speech"?

Many thanks!

Hi,
if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark.
I suggest to continue training at least up to 300k.

Thank you!