facebookresearch/fairseq-lua

question about network training

linhanxiao opened this issue · 3 comments

when training network,the network input is sample.input={{targetIn,targetPosIn},sourceIn}
, crit:forward(net.output,sample.target),but {targetIn,targetPosIn} and sample.target essentially is same thing ,so the input of network contain the information about the target of sample,I wonder why the input of network is that?

yes, the training process requires the target information of the sample, as it uses the previous token to predict the current token, and use the current token to predict the next token.

temp
(credit: https://www.tensorflow.org/tutorials/seq2seq)
In the above image, <go>, W,X,Y,Z at the bottom are the input for the decoder, and W,X,Y,Z,EOS on the top are the expected output for the decoder. When W is fed into the model, the decoder is expected to produce X, and so on.. That's why most seq2seq model have two target inputs(with one time-step offset), one for decoder input, one for loss evaluation

thank you very much

@frankang is right. We basically treat a seq2seq model as a language model in training, that is we have it predict the next words in the target language from given current words in the target language. The source language sentence is provided as an "extra" input.