Possible bug in the padding mask handling

Question

Possible bug in the padding mask handling

Opened this issue 4 years ago · 2 comments

I've stared at these lines in your excellent tutorial for a while now:

 enc_padding_mask = tf.keras.layers.Lambda(
     create_padding_mask, output_shape=(1, 1, None),
     name='enc_padding_mask')(inputs)
 # mask the future tokens for decoder inputs at the 1st attention block
 look_ahead_mask = tf.keras.layers.Lambda(
     create_look_ahead_mask,
     output_shape=(1, None, None),
     name='look_ahead_mask')(dec_inputs)
 # mask the encoder outputs for the 2nd attention block
 dec_padding_mask = tf.keras.layers.Lambda(
     create_padding_mask, output_shape=(1, 1, None),
     name='dec_padding_mask')(inputs)

enc_padding_mask and dec_padding_mask will always be equal. Is this intentional? It seems weird to create two different padding masks that are the same.

Answer 1 · 2020-06-28T14:26:56.000Z

Yes, I believe so. These two masks are to mask out the padding tokens in the input sentence, see http://nlp.seas.harvard.edu/2018/04/03/attention.html#batches-and-masking

Answer 2 · 2020-06-28T15:34:24.000Z

Oh, I see. But then it would be more efficient to only use one mask?