Encoder self_attention input tensor clarification.

Question

Encoder self_attention input tensor clarification.

Artaches opened this issue 7 years ago · 3 comments

Hello, I was looking through your implementation, and I got a little confused by your encoder and decoder input tensors.
For encoder you have this in the build() for encoder.py:

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
                                                                 k=encoder_inputs,
                                                                 v=encoder_inputs), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

What I'm confused about is why you're using encoder_inputs as the query, key, and value tensors for self_attention function. For the stacked layers, shouldn't the q, k, v tensors be the "o1 = tf.identity(o3)" outputs? In other words, shouldn't the build function be like so:

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
                                                                 k=encoder_inputs,
                                                                 v=encoder_inputs), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)
                encoder_inputs = o1     # Set the attention input tensors for next stack to be the output of this stack?

        return o3

Likewise in the decoder, "decoder_inputs" never get reset in the stacks building loop. Any clarification would be great! Thanks!

Answer 1 · 2018-03-13T03:18:51.000Z

Hello @Artaches,
I need to change the parameter from encoder_inputs to decoder_inputs in decoder.
And you're totally right, q, k, v tensors are never get reset.

I'll fix it like below. :)

encoder.py

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=o1, k=o1, v=o1), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

decoder.py

def build(self, decoder_inputs):
        o1 = tf.identity(decoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=o1, k=o1, v=o1), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

Thank you so much.

Answer 2 · 2018-03-13T04:47:14.000Z

Thank you for the clarification! Great implementation btw! I've found many other hard to follow, and this implementation helped me the most to understand the model.

Answer 3 · 2018-03-13T07:24:58.000Z

My pleasure. :)