Rongjiehuang/GenerSpeech

Mismatch between paper and the implementation

Closed this issue · 2 comments

Hi,
Thank you for your work. I noticed a mismatch between your implementation and the paper in the postnet. In the paper postnet is conditioned on the decoder input and the mel-decoder output while in the implementation you condition the postnet on the decoder input and other conditions (speaker, emotion and the prosody). You don't condition the output of the transformer decoder. Is there any reason for this mismatch?

Kind regards

Hi, Please refer to https://github.com/Rongjiehuang/GenerSpeech/blob/main/modules/GenerSpeech/model/generspeech.py#L242
, it conditions the glow on the transformer decoder output and acoustic conditions (speaker, emotion and the prosody)

Thank you for the quick response. I overlooked the code. I just notice it.
Thanks