Text2Mel input to WaveGlow outputs noisy audio file without any speech

Question

Text2Mel input to WaveGlow outputs noisy audio file without any speech

Opened this issue 5 years ago · 1 comments

I've retrained the text2mel model (Described in [https://arxiv.org/pdf/1710.08969.pdf]), by cutting out mel reduction part in preprocessor, and changing the hparams to:

hop_length = 256
win_length = 1024
max_N = 180 # Maximum number of characters.
max_T = 210 # Maximum number of mel frames.
e = 512 # embedding dimension
d = 256 # Text2Mel hidden unit dimension

I'm trying to feed generated mels to WaveGlow, but output audio file is just noisy honk.
Any ideas?

Answer 1 · 2020-06-25T21:30:50.000Z

Make sure the mel-spectrogram preprocessing match the one used in Waveglow.