thu-ml/Bridge-TTS

Interesting Work. Hope for releasing of code/model.

ChangeFWorld opened this issue · 2 comments

I'm curious about if this method can generate different samples? since the text-distribution is fixed, it is likely to generate same samples if the text input doesn't change.

Any plan of code release?

Hi, thank you for your interest in our work, and apology for the late reply. In practice, we observe that sampling with Bridge SDE generally produces the same level of diversity as Grad-TTS. Although the diversity of both methods is subtle and mainly exists in terms of voice quality (e.g., whether there is an artifact) rather than variations in tempo of speech or prosody. In my opinion, this phenomenon is largely due to the one-to-one mapping of the training data they used, and unfixing the text distribution like Grad-TTS can not fundamentally address the diversity problem.

On the other hand, the technique of fixing the text distribution is not necessary in Bridge-TTS. It is a flexible practical choice that can be changed over different tasks/datasets.

Regarding the code, we plan to release it upon acceptance. Sorry for the inconvenience.