about Mel_Post_Net
Opened this issue · 1 comments
Many thanks for your great work.
I have a question about the Post-Net after the Mel Linear layer, I find in the inference stage, you use the mel_pred as output rather than the postnet_pred, which is shown as follows (in synthesis,py):
with t.no_grad():
for i in pbar:
pos_mel = t.arange(1,mel_input.size(1)+1).unsqueeze(0).cuda()
**mel_pred**, postnet_pred, attn, stop_token, _, attn_dec = m.forward(text, mel_input, pos_text, pos_mel)
mel_input = t.cat([mel_input, **mel_pred**[:,-1:,:]], dim=1)
Also, when I run the code, I find that the post_mel_loss is always bigger than the mel_loss, which means the post_Net module doesn't work as expected, right? I think it is conflicted with the Post-Net module used in Tacotron and the TTS-Transformer original paper. I am a bit confused, can you explain it to me? Many thanks!
Thanks for great work.
I have the same question with @liuxubo717 , I compare with the tacotron2, find that its postnet is "torch.nn.Conv1d", I don't understand that using the "Bi-GRU" in your postnet, does it better than the postnet of Tacotron2? can you explain than? Thanks.