facebookresearch/ConvNeXt-V2

LayerNorm vs torch.nn.LayerNorm

Michael-H777 opened this issue · 1 comments

Hello,

I just want to ask, what is the functional difference between the implemented layernorm vs layernorm in pytorch? (not the sparse layer norm).

Would using layernorm from pytorch impact performance?

Looks like basically samething,
implemented layernorm just add ability to handle layer norm for channel dim
(N, C, H, W -> "channel first " will do layer norm on dim C)

But for myself, I will use GroupNorm with group num 1 or just do some permute on it.(For downsample part)

BTW, nn.LayerNorm and FB's custom LayerNorm is "same thing" if run in channel last mode.