Why are query, key, and value all set to the same value?

Question

Why are query, key, and value all set to the same value?

SanJoseCosta opened this issue 2 years ago · 2 comments

Multi-head self-attention output (`tf.keras.layers.MultiHeadAttention` ).

attn_output = self.mha(
    query=x,  # Query Q tensor.
    value=x,  # Value V tensor.
    key=x,  # Key K tensor.
    attention_mask=attention_mask, # A boolean mask that prevents attention to certain positions.
    training=training, # A boolean indicating whether the layer should behave in training mode.
    )

Answer 1 · 2022-10-01T17:35:24.000Z

Also, result when running notebook section "Test the decoder layer:"

TypeError: Exception encountered when calling layer "decoder_layer_1" (type DecoderLayer).

call() got an unexpected keyword argument 'use_causal_mask'

Answer 2 · 2022-10-03T18:53:06.000Z

Why are query, key, and value all set to the same value?

Because this mha-layer is doing self attention. Each location in x provides a query, a key and a value.

There are Dense in the mha that project from the vectors in x to q,k,v vectors..
So this looks funny but it's really "make q, k, and v each from x".

TypeError: Exception encountered when calling layer "decoder_layer_1" (type DecoderLayer).

This feature was only added in tensorflow-2.10. Check your TensorFlow version.

Multi-head self-attention output (tf.keras.layers.MultiHeadAttention ).

Multi-head self-attention output (`tf.keras.layers.MultiHeadAttention` ).