Why are query, key, and value all set to the same value?
SanJoseCosta opened this issue · 2 comments
SanJoseCosta commented
Multi-head self-attention output (tf.keras.layers.MultiHeadAttention
).
attn_output = self.mha(
query=x, # Query Q tensor.
value=x, # Value V tensor.
key=x, # Key K tensor.
attention_mask=attention_mask, # A boolean mask that prevents attention to certain positions.
training=training, # A boolean indicating whether the layer should behave in training mode.
)
SanJoseCosta commented
Also, result when running notebook section "Test the decoder layer:"
TypeError: Exception encountered when calling layer "decoder_layer_1" (type DecoderLayer).
call() got an unexpected keyword argument 'use_causal_mask'
MarkDaoust commented
Why are query, key, and value all set to the same value?
Because this mha-layer is doing self attention. Each location in x
provides a query
, a key
and a value
.
There are Dense
in the mha
that project from the vectors in x to q
,k
,v
vectors..
So this looks funny but it's really "make q, k, and v each from x".
TypeError: Exception encountered when calling layer "decoder_layer_1" (type DecoderLayer).
This feature was only added in tensorflow-2.10
. Check your TensorFlow version.