integration of fastspeech with Squeezewave vocoder
alokprasad opened this issue · 3 comments
Placeholder for issue related to integration of fastspeech with squeezewave
https://github.com/tianrengao/SqueezeWave
seems to quite faster than waveflow.
i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav
from Squeezewave but i get following error.
Traceback (most recent call last):
File "inference.py", line 87, in
args.sampling_rate, args.is_fp16, args.denoiser_strength)
File "inference.py", line 57, in main
audio = squeezewave.infer(mel, sigma=sigma).float()
File "/mount/data/SqueezeWave/glow.py", line 261, in infer
output = self.WN[k]((audio_0, spect))
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mount/data/SqueezeWave/glow.py", line 165, in forward
spect = self.cond_layer(spect)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead
Any idea was could be the issue?
saving the mel_postnet_torch produces output which is the input to squeezewave
melspec = torch.squeeze(mel_postnet_torch, 0)
torch.save(melspec, "/tmp/test.pt")
test.pt will be melspectrogram input to squeezewave.
Following Text -->" Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition in being comparatively modern"
Got generated astonishing fast in single core cpu ( no gpu)( have included model loading time)
Audio Duration generated 11.5 Sec in around 3.83 seconds
MEL Calculation:
2.827802896499634
Squeezewave vocoder time
1.0016820430755615