autoregressive setup during validation

Question

autoregressive setup during validation

etienne87 opened this issue 2 years ago · 3 comments

Hello, one thing i do not understand with this model is that the input to the decoder is exactly the ground truth. Do you evaluate in the autoregressive mode? ~~what is preventing the model to just learn the identity?~~
EDIT: I modify this, teacher forcing means the model does not have direct access to the input sequence, only to the previous timesteps (thanks to the upper triangular mask for attention). This is why it is okay during training, but in a real inference scenario you will not have access to the sequence, therefore the need for the autoregressive setup.

Answer 1 · 2023-01-09T09:34:49.000Z

Hi,

During both training and validation epochs, the input to transformer is the sequence starting from BOS token to the penultimate token and the ground truth fed to the criterion is the shifted sequence (from the token after BOS to the last token). So, I am using "teacher forcing" in my implementation.

The decoder mask in self-attention layers prevents the decoder from peaking on future tokens and ensures only the previous tokens are visible to the model.

Answer 2 · 2023-01-09T12:40:15.000Z

i agree that it is not like cheating and this is ok for training in parallel but the validation should be done using a for loop, otherwise you are not really using your own predictions but the gt ?

Answer 3 · 2023-01-10T20:46:12.000Z

Nevermind, i see you actually run the loop in test.py!