NVIDIA/waveglow

WaveGlow MelAudioLoader Question

dxlin17 opened this issue · 0 comments

Hi All,

I was looking at examples of speech synthesis via Tacotron2 + WaveGlow and saw an option for Tacotron called --load-mel-from-disk. For WaveGlow, this argument doesn't seem to be used due to the random selection of <segment_length> audio samples from the input audio file which are then used to generate a Mel spectrogram. Is my understanding correct?

If so, is there any reasonable scenario (aside from generating each Mel spectrogram for random clips for all audio files) where the Mel-spectrograms could be generated prior to training that could be loaded from disk rather than regenerating spectrograms on random samples each time?

Thank you!