LayerNorm vs torch.nn.LayerNorm
Michael-H777 opened this issue · 1 comments
Michael-H777 commented
Hello,
I just want to ask, what is the functional difference between the implemented layernorm vs layernorm in pytorch? (not the sparse layer norm).
Would using layernorm from pytorch impact performance?
KohakuBlueleaf commented
Looks like basically samething,
implemented layernorm just add ability to handle layer norm for channel dim
(N, C, H, W -> "channel first " will do layer norm on dim C)
But for myself, I will use GroupNorm with group num 1 or just do some permute on it.(For downsample part)
BTW, nn.LayerNorm and FB's custom LayerNorm is "same thing" if run in channel last mode.