Probleam about data processing

Question

Probleam about data processing

Zhongxu-Wang opened this issue a year ago · 1 comments

I found this line of code in the meldataset.py file and I was curious about what it does. Why does wav need to be extended in the code?
wave = torch.cat([torch.zeros([5000]), wave, torch.zeros([5000])], axis=0)

Answer 1 · 2023-06-21T03:40:51.000Z

This is to compensate for the start of the text and the end of the text (token index 0) for the text aligner. We append a silence token at the beginning and end of the text to make it align leading and ending silences, but some datasets like LibriTTS and LJSpeech do not have silences at the beginning and the end, so we add some silences at the beginning and the end of the sentence for robustness.