Closed this issue 20 days ago · 0 comments
May I ask why the PreNorm is removed? I am very curious about the reason. As the Transformer encoder is different from the architecture as shown below, which is from the original paper.