NVIDIA/waveglow

Text2Mel input to WaveGlow outputs noisy audio file without any speech

Opened this issue · 1 comments

I've retrained the text2mel model (Described in [https://arxiv.org/pdf/1710.08969.pdf]), by cutting out mel reduction part in preprocessor, and changing the hparams to:

hop_length = 256
win_length = 1024
max_N = 180 # Maximum number of characters.
max_T = 210 # Maximum number of mel frames.
e = 512 # embedding dimension
d = 256 # Text2Mel hidden unit dimension

I'm trying to feed generated mels to WaveGlow, but output audio file is just noisy honk.
Any ideas?

Make sure the mel-spectrogram preprocessing match the one used in Waveglow.