Encoder self_attention input tensor clarification.
Artaches opened this issue · 3 comments
Hello, I was looking through your implementation, and I got a little confused by your encoder and decoder input tensors.
For encoder you have this in the build() for encoder.py:
def build(self, encoder_inputs):
o1 = tf.identity(encoder_inputs)
for i in range(1, self.num_layers+1):
with tf.variable_scope(f"layer-{i}"):
o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
k=encoder_inputs,
v=encoder_inputs), num=1)
o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
o1 = tf.identity(o3)
return o3
What I'm confused about is why you're using encoder_inputs as the query, key, and value tensors for self_attention function. For the stacked layers, shouldn't the q, k, v tensors be the "o1 = tf.identity(o3)" outputs? In other words, shouldn't the build function be like so:
def build(self, encoder_inputs):
o1 = tf.identity(encoder_inputs)
for i in range(1, self.num_layers+1):
with tf.variable_scope(f"layer-{i}"):
o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
k=encoder_inputs,
v=encoder_inputs), num=1)
o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
o1 = tf.identity(o3)
encoder_inputs = o1 # Set the attention input tensors for next stack to be the output of this stack?
return o3
Likewise in the decoder, "decoder_inputs" never get reset in the stacks building loop. Any clarification would be great! Thanks!
Hello @Artaches,
I need to change the parameter from encoder_inputs
to decoder_inputs
in decoder.
And you're totally right, q, k, v tensors are never get reset.
I'll fix it like below. :)
encoder.py
def build(self, encoder_inputs):
o1 = tf.identity(encoder_inputs)
for i in range(1, self.num_layers+1):
with tf.variable_scope(f"layer-{i}"):
o2 = self._add_and_norm(o1, self._self_attention(q=o1, k=o1, v=o1), num=1)
o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
o1 = tf.identity(o3)
return o3
decoder.py
def build(self, decoder_inputs):
o1 = tf.identity(decoder_inputs)
for i in range(1, self.num_layers+1):
with tf.variable_scope(f"layer-{i}"):
o2 = self._add_and_norm(o1, self._self_attention(q=o1, k=o1, v=o1), num=1)
o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
o1 = tf.identity(o3)
return o3
Thank you so much.
Thank you for the clarification! Great implementation btw! I've found many other hard to follow, and this implementation helped me the most to understand the model.
My pleasure. :)