ex3ndr/supervoice-vall-e-2

Misspelling issues.

Opened this issue · 0 comments

I have tried your models(voicebox and this one) and vall-e-2 sounds more natural, but there is lot of misspellings in the generated speech. Is it because of dataset? Have you tried to train voicebox on the libriheavy?