DongjunLee/transformer-tensorflow

Encoder self_attention input tensor clarification.

Artaches opened this issue · 3 comments

Hello, I was looking through your implementation, and I got a little confused by your encoder and decoder input tensors.
For encoder you have this in the build() for encoder.py:

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
                                                                 k=encoder_inputs,
                                                                 v=encoder_inputs), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

What I'm confused about is why you're using encoder_inputs as the query, key, and value tensors for self_attention function. For the stacked layers, shouldn't the q, k, v tensors be the "o1 = tf.identity(o3)" outputs? In other words, shouldn't the build function be like so:

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=encoder_inputs,
                                                                 k=encoder_inputs,
                                                                 v=encoder_inputs), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)
                encoder_inputs = o1     # Set the attention input tensors for next stack to be the output of this stack?

        return o3

Likewise in the decoder, "decoder_inputs" never get reset in the stacks building loop. Any clarification would be great! Thanks!

Hello @Artaches,
I need to change the parameter from encoder_inputs to decoder_inputs in decoder.
And you're totally right, q, k, v tensors are never get reset.

I'll fix it like below. :)

encoder.py

def build(self, encoder_inputs):
        o1 = tf.identity(encoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=o1, k=o1, v=o1), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

decoder.py

def build(self, decoder_inputs):
        o1 = tf.identity(decoder_inputs)

        for i in range(1, self.num_layers+1):
            with tf.variable_scope(f"layer-{i}"):
                o2 = self._add_and_norm(o1, self._self_attention(q=o1, k=o1, v=o1), num=1)
                o3 = self._add_and_norm(o2, self._positional_feed_forward(o2), num=2)
                o1 = tf.identity(o3)

        return o3

Thank you so much.

Thank you for the clarification! Great implementation btw! I've found many other hard to follow, and this implementation helped me the most to understand the model.

My pleasure. :)