The code may not be consistent with the formula
kafmws opened this issue · 2 comments
kafmws commented
In the paper, the description of the encoder in Chapter 3.1. IPT architecture
is:
$y_0 = [ E_{p1} + f_{p1} , E_{p2} + f_{p2} , \dots, E_{pN} + f_{pN} ],$
$q_i = k_i = v_i = LN(y_{i-1}),$
$y_i^{\prime} = MSA(q_i, k_i, v_i) + y_{i-1},$
$\cdots$
however, in the class TransformerEncoderLayer
of the python file model/ipt.py
, the code writes:
class TransformerEncoderLayer(nn.Module):
def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, no_norm = False,
activation="relu"):
...
def with_pos_embed(self, tensor, pos):
return tensor if pos is None else tensor + pos
def forward(self, src, pos = None):
src2 = self.norm1(src) # here
q = k = self.with_pos_embed(src2, pos) # here
src2 = self.self_attn(q, k, src2) # here
src = src + self.dropout1(src2[0])
src2 = self.norm2(src)
src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))
src = src + self.dropout2(src2)
return src
It's more likely the formula should be:
, OR may they are equivalent ? I'm confused. Thanks.
HantingChen commented
Sorry for the misleading. The two formulas are inequivalent and the second formula is right.
kafmws commented
Thanks for your reply.