LayerNorm vs torch.nn.LayerNorm

Question

LayerNorm vs torch.nn.LayerNorm

Michael-H777 opened this issue 2 years ago · 1 comments

Hello,

I just want to ask, what is the functional difference between the implemented layernorm vs layernorm in pytorch? (not the sparse layer norm).

Would using layernorm from pytorch impact performance?

Answer 1 · 2023-02-11T07:01:45.000Z

Looks like basically samething,
implemented layernorm just add ability to handle layer norm for channel dim
(N, C, H, W -> "channel first " will do layer norm on dim C)

But for myself, I will use GroupNorm with group num 1 or just do some permute on it.(For downsample part)

BTW, nn.LayerNorm and FB's custom LayerNorm is "same thing" if run in channel last mode.