Add&Norm layer is missing after each attention layer
Yizheng-Sun opened this issue · 0 comments
Yizheng-Sun commented
Hi,
According to the transformer architecture, there should be a add&norm layer after each attention layer. However for code in /docs/tutorials/transformer.ipynb, those Add&Norm layers are missing.
Take decoder layer for example, the original code is:
def call(self, x, context):
x = self.causal_self_attention(x=x)
x = self.cross_attention(x=x, context=context)
\# Cache the last attention scores for plotting later
self.last_attn_scores = self.cross_attention.last_attn_scores
x = self.ffn(x) # Shape `(batch_size, seq_len, d_model)`.
return x
If we add Add&Norm layers, it should be:
def call(self, x, context):
x = self.add([x, self.causal_self_attention(x=x)])
x = self.layer_norm1(x)
x = self.add([x, self.cross_attention(x=x, context=context)])
x = self.layer_norm2(x)
\# Cache the last attention scores for plotting later
self.last_attn_scores = self.cross_attention.last_attn_scores
x = self.ffn(x) # Shape `(batch_size, seq_len, d_model)`.
return x
I wonder if those add&norm layers are ignored on purpose.