Hello, I have a confusion on the decoder. Is the former output of Decoder RNN transferred to the input of next timestep, that is the input of next Pre-net, as is shonw in Fig.1 in the paper?

Question

Hello, I have a confusion on the decoder. Is the former output of Decoder RNN transferred to the input of next timestep, that is the input of next Pre-net, as is shonw in Fig.1 in the paper?

Taishanren80 opened this issue 7 years ago · 0 comments

At Line.47 of train.py, self.decoder_input is transferred to decode1. When training, self.decoder_input is the result of shift_by_one(self.y). When evaling, line.45 of eval.py, self.decoder_input is a matrix of np.zeros.
In this way, I don’t think that the former output of decoder RNN is transferred to the input of nex timestep. Is my thought right?
Is the source code really implement the model of Fig.1 in the paper “TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS”?
How to understand this?
Asking for your help!
Thank you very much!