yuhangzang/UPT

Mismatch between Figure 3a and Equation 5 in paper

krasserm opened this issue · 1 comments

Thank you for the very interesting paper and your plan to release the code. Since there is no initial code release yet (at the time of opening this issue), I have an implementation-related question: the lightweight transformer layer ${\theta}$ is defined in Equation 5 as

$U' = \text{SA}(U) + \text{LN}(U)$
$\hat{U} = \text{FFN}(\text{LN}(U')) + \text{LN(U')}$

whereas Figure 3a looks more like

$U' = \text{LN}(\text{SA}(U) + U)$
$\hat{U} = \text{LN}(\text{FFN}(U') + U')$

Which one is correct i.e. is used in the implementation?

I am also very interested in the answer to this question