bryanlimy/tf2-transformer-chatbot

Possible bug in the padding mask handling

Opened this issue · 2 comments

I've stared at these lines in your excellent tutorial for a while now:

 enc_padding_mask = tf.keras.layers.Lambda(
     create_padding_mask, output_shape=(1, 1, None),
     name='enc_padding_mask')(inputs)
 # mask the future tokens for decoder inputs at the 1st attention block
 look_ahead_mask = tf.keras.layers.Lambda(
     create_look_ahead_mask,
     output_shape=(1, None, None),
     name='look_ahead_mask')(dec_inputs)
 # mask the encoder outputs for the 2nd attention block
 dec_padding_mask = tf.keras.layers.Lambda(
     create_padding_mask, output_shape=(1, 1, None),
     name='dec_padding_mask')(inputs)

enc_padding_mask and dec_padding_mask will always be equal. Is this intentional? It seems weird to create two different padding masks that are the same.

Yes, I believe so. These two masks are to mask out the padding tokens in the input sentence, see http://nlp.seas.harvard.edu/2018/04/03/attention.html#batches-and-masking

Oh, I see. But then it would be more efficient to only use one mask?