bzhangGo/rmsnorm

Normalization of CNN in CIFAR-10 experiments

Opened this issue · 1 comments

Hi,
Congratulations for the amazing work. I have some doubts regarding rms normalization.

  1. Which dimensions should be considered for normalization of a CNN?? In the torch code, default axis is -1 which means Width dimension in pytorch CNN. However, in tensorflow it is channels.

  2. Can the normalization be applied on other dimensions as well?? Like in CIFAR-10 experiments. LayerNorm was applied on width and height dimensions.

Thank you.

@avinashsai Thanks for pointing this out.

  1. The PyTorch (rmsnorm_torch) and TensorFlow (rmsnorm_tensorflow) code do NOT consider the case of CNN. By default, the code can be used for RNN, Feed-Forward and Attention networks, and the normalization is applied to the last dimension.

  2. For the normalization of CNN, I follow the LayerNorm and apply it to the width and height dimensions. Please refer to the CIFAR-10 Classification Section in README for more details.

Biao