airsplay/R2R-EnvDrop

Aligning in speaker

zwx8981 opened this issue · 2 comments

input = logits[:, :, :-1], # -1 for aligning

Hi, I am confused about the aligning operation (by your comments). It seems that you ignore the last element of the predicted logits (the logits of 'EOS' or 'PAD') and the 'BOS' of the target when you compute the loss during training the speaker (which, in my view, make the logits and target unalgined...). Can you explain how does this operation make the logits and target aligned?

Hmmm.

Let me explain it with an example. Suppose the sequence is "hello world", the input and desired output in teacher_forcing would be:

Time:                  1       2       3       4
Input:              <bos>    hello   world   <eos>  
Desired Output:     hello    world   <eos>    ???

And the logit outputed by the model is:

Time:                  1       2       3       4
Input:              <bos>    hello   world   <eos>  
Output Logit:        L_0      L_1     L_2     L_3 

Thus the aligned logit and target would be

logit[:-1]:            L_0     L_1     L_2  
Input[1:]:            hello   world   <eos>  

And it is what the code does.

Hope this answers your question!

Oh, that's very clear! Thank you so much!