The code may not be consistent with the formula

Question

The code may not be consistent with the formula

kafmws opened this issue 2 years ago · 2 comments

In the paper, the description of the encoder in Chapter 3.1. IPT architecture is:

$y_0 = [ E_{p1} + f_{p1} , E_{p2} + f_{p2} , \dots, E_{pN} + f_{pN} ],$
$q_i = k_i = v_i = LN(y_{i-1}),$
$y_i^{\prime} = MSA(q_i, k_i, v_i) + y_{i-1},$
$\cdots$

however, in the class TransformerEncoderLayer of the python file model/ipt.py, the code writes:

class TransformerEncoderLayer(nn.Module):

    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, no_norm = False,
                 activation="relu"):
        ...

    def with_pos_embed(self, tensor, pos):
        return tensor if pos is None else tensor + pos
    
    def forward(self, src, pos = None):
        src2 = self.norm1(src)                                         # here
        q = k = self.with_pos_embed(src2, pos)             # here
        src2 = self.self_attn(q, k, src2)                            # here
        src = src + self.dropout1(src2[0])
        src2 = self.norm2(src)
        src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))
        src = src + self.dropout2(src2)
        return src

It's more likely the formula should be:
$v_i=LN([ f_{p1} , f_{p2} , \dots, f_{pN} ]),$
$y_0 = [ E_{p1} + f_{p1} , E_{p2} + f_{p2} , \dots, E_{pN} + f_{pN} ],$
$q_i = k_i = LN(y_{i-1}),$

, OR may they are equivalent ? I'm confused. Thanks.

Answer 1 · 2022-10-28T08:42:57.000Z

Sorry for the misleading. The two formulas are inequivalent and the second formula is right.

Answer 2 · 2022-10-28T10:14:43.000Z

Thanks for your reply.