NTT123/vietTTS

Training with other dataset

TruongThuyLiem opened this issue · 2 comments

Hi all, I see you say that use MFA to prepare dataset (textGrid file), I tried to use it but it has a lot of issues with Vietnamese , I generated a lexicon.txt file based on the g2p model, but when using the acoustic model to generate textGrid file, the error is :
"There were phones in the dictionary that do not have acoustic models: au_T4, au_T6, eu_T5, ieu_T5, ui2_T2, uoi2_T2, uoi3_T6, uou_T1, uou_T3".
And I tried using your lexicon.txt file but the error is : "There were phones in the dictionary that do not have acoustic models: a, c, d, e, i, o, q, u, y, à, á, â, ã, è, é, ê, ì, í, ò, ó, ô, õ, ù, ú, ý, ă, đ, ĩ, ũ, ơ, ư, ạ, ả, ấ, ầ, ẩ, ẫ, ậ, ắ, ằ, ẳ, ẵ, ặ, ẹ, ẻ, ẽ, ế, ề, ể, ễ, ệ, ỉ, ị, ọ, ỏ, ố, ồ, ổ, ỗ, ộ, ớ, ờ, ở, ỡ, ợ, ụ, ủ, ứ, ừ, ử, ữ, ự, ỳ, ỵ, ỷ, ỹ"

Could you share me about the g2p and acoustic model you used? Thank you so much

@TruongThuyLiem If I understand correctly, you are using the pretrained models from MFA website which uses a different phoneme set and it also misses some phonemes as you listed.

If you still want to use the pretrained model from MFA, you can easily modify the generated lexicon (from g2p model) by replacing missing phones with similar-sounding phones, for example the same phone with a different tone.

You can also train your own MFA models, to do that you would need to generate your own lexicon file and a few hours to train on your dataset. Here is the link to a colab notebook that I used to train new MFA model. Note that the notebook generates a slightly different phoneme set from the phoneme set used in this repo.

@TruongThuyLiem If I understand correctly, you are using the pretrained models from MFA website which uses a different phoneme set and it also misses some phonemes as you listed.

If you still want to use the pretrained model from MFA, you can easily modify the generated lexicon (from g2p model) by replacing missing phones with similar-sounding phones, for example the same phone with a different tone.

You can also train your own MFA models, to do that you would need to generate your own lexicon file and a few hours to train on your dataset. Here is the link to a colab notebook that I used to train new MFA model. Note that the notebook generates a slightly different phoneme set from the phoneme set used in this repo.

Thank you so much