Quality comparison to the original implementation

Question

Quality comparison to the original implementation

Opened this issue 2 years ago · 1 comments

Thank you for your great work! I really hope you get approval for publishing the models.
In your notes you write:

This implementation does not include pre-training of phonemes using a large-scale text corpus from the news-crawl dataset.

Does this mean the quality here will be worse?

Answer 1 · 2023-02-09T14:37:15.000Z

Hi @dreamflasher, thank you for your attention.
Unfortunately, I'm sorry to inform you that I might need to retrain using my own GPU to publish the model, which would take some time.

This implementation does not include pre-training of phonemes using a large-scale text corpus from the news-crawl dataset.

Yes, you can expect there would be a little quality drop (-0.09 as stated in the paper), but still get better quality than the baseline (VITS) due to other changes.