NTT123/vietTTS

How to add marker of sil, sp to TextGrid after MFA?

nampdn opened this issue · 6 comments

Hi @NTT123,
First of all thank you for your brilliant work! I have successfully trained my dataset with MFA, but it is not generated .TextGrid as a marker for silence, space. Could you please help me on how we can detect and add these symbol to the TextGrid file?

Hi @nampdn, thank you for reporting this. The newest version of MFA removes these markers.

According to MontrealCorpusTools/Montreal-Forced-Aligner#377
you have to run mfa align or mfa train with an additional argument --disable_textgrid_cleanup.

@nampdn, please checkout the fix_sil branch for a quick fix. This branch can read textgrid files that have no "sil" or "sp" markers.

Woot! I'm so grateful. I'll try it now.
Have a happy holiday!

Hi @NTT123 ,
After pull latest fixes for sil. I still have problem with some utterance that has number in it.

('n', 'g', 'ư', 'ờ', 'i', ' ', 'đ', 'o', ' ', 'c', 'h', 'i', 'ề', 'u', ' ', 'r', 'ộ', 'n', 'g', ' ', 'c', 'ủ', 'a', ' ', 'l', 'ố', 'i', ' ', 'v', 'à', 'o', ' ', 'c', 'ổ', 'n', 'g', ' ', 'sil', 'l', 'à', ' ', 'n', 'ă', 'm', ' ', 'sil', '3', ' ', 'm', 'é', 't', ' ', 'sil', 'v', 'à', ' ', 'c', 'h', 'i', 'ề', 'u', ' ', 'd', 'à', 'i', ' ', 'l', 'à', ' ', 's', 'á', 'u', ' ', 'sil', '9', ' ', 'm', 'é', 't', ' ', 'sil')
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 181, in <module>
    train()
  File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 100, in train
    batch = next(train_data_iter)
  File "/content/vietTTS/vietTTS/nat/data_loader.py", line 111, in load_textgrid_wav
    ps = [phonemes.index(p) for p in ps]
  File "/content/vietTTS/vietTTS/nat/data_loader.py", line 111, in <listcomp>
    ps = [phonemes.index(p) for p in ps]
ValueError: '3' is not in list

Can you take a look on this sample? Can I add 0-9 into the phonemes list or I have to flatten the number into readable text?

You have to normalize the transcripts. For example, "3" should be converted to "ba".
This is the reason why numbers are not includes in the phonemes list.

Oh I got that point, cheers!