lucidrains/vit-pytorch

Why Remove PreNorm?

Closed this issue · 0 comments

May I ask why the PreNorm is removed? I am very curious about the reason.
image
As the Transformer encoder is different from the architecture as shown below, which is from the original paper.
image