howardyclo/pytorch-seq2seq-example

two questions in LuongAttnDecoderRNN

mirror111 opened this issue · 3 comments

In the LuongAttnDecoderRNN
1 when t=0 decoder_hidden is last encoder hidden state (num_layers * num_directions, batch_size, hidden_size).but in EncoderRNN, the last hidden state is (num_layers, batch_size, hidden_size * num_directions). is it right?
2 There is a line of code is
decoder_output, decoder_hidden = decoder.rnn(emb, decoder_hidden)
i think it should be
decoder_output, decoder_hidden = self.rnn(emb, decoder_hidden)

@mirror111 Hello! Thanks for opening the issue:

  1. Correct.
  2. Yes, nice spot!

In the function evaluate()
1 i think the code
encoder_optim.zero_grad()
decoder_optim.zero_grad()
is unnecessary. and the function evaluate() don't have these parameters.either.
2 when in the evaluate section, the decoder's input should from the top word from decoder's output, or from the real target?

@mirror111 Hello,

  1. You're right, thanks for the spot.
  2. It should be the decoder's output (it should be the same as translation section). Thanks again!
    But I think it may also be okay if we feed the previous target word to the decoder in current decoding time step, just same as the training mode.

You are welcome to give me a pull request :-)