Masknet replace batchnorm with layernorm
zippeurfou opened this issue · 6 comments
The paper from masknet uses layernorm.
however the code implementation uses batchnorm.
Hi Marc,
Thanks for bringing this up! This is indeed a bug, and we are fixing it.
Hi Marc,
Upon checking, this is not a bug. When applying BatchNorm on the default axis (last dim), BatchNorm reduces to LayerNorm, and since the size of gamma/beta depends on the shape of input tensor, the original implementation is still correct.
However, for the clarity of the code, we updated the example (ref PR #816 ).
Thanks for the comment!
Because your code isn't in trianing.
tf.layers.batch_normalization()
will call to class BatchNormalizationBase
tf.keras.layers.LayerNormalization()
will call to class LayerNormalization
In LayerNormalization, mean and var are computed by nn.moments
then use
nn.batch_normalization
to get the result.DeepRec/tensorflow/python/keras/layers/normalization.py
Lines 1040 to 1046 in 6bd822e
It is the same with BN without other features.
DeepRec/tensorflow/python/keras/layers/normalization.py
Lines 643 to 652 in 6bd822e
DeepRec/tensorflow/python/keras/layers/normalization.py
Lines 736 to 739 in 6bd822e
DeepRec/tensorflow/python/keras/layers/normalization.py
Lines 820 to 825 in 6bd822e
But the difference is that when you are not in training, the mean and var of BN will be replaced.
DeepRec/tensorflow/python/keras/layers/normalization.py
Lines 744 to 750 in 6bd822e
you can add input param moving_mean_initializer='ones'
which is defaulted to 'zeros' and find output is changed.
Thanks @Duyi-Wang it makes sense. I was confused by it as well but the doc clearly state it. Thanks for pointing out the code.
Adding a screenshot for posterity.
Feel free to close this one.