CQFIO/FastImageProcessing

You can use slim.batch_norm(scaled=True) to achieve same ability of Adaptive Normalization

mzh0 opened this issue · 5 comments

mzh0 commented

image

CQFIO commented

slim.batch_norm(scaled=True) is somehow equivalent to slim.batch_norm(scaled=False) in our case:

    scale: If True, multiply by `gamma`. If False, `gamma` is
      not used. When the next layer is linear (also e.g. `nn.relu`), this can be
      disabled since the scaling can be done by the next layer.

described in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py

Yesterday I also tried to run a model with a normalization function like this for Rudin-Osher-Fatemi:

def nm(x):
    return slim.batch_norm(x,scale=True)

But the performance is similar to slim.batch_norm(x,scale=False). It has MSE at 56. But our adaptive normalization can achieve MSE at 0.6.

The difference might come from different parametrization or initialization.

mzh0 commented

What makes scale=True is equivalent to scaled=False for CAN?

image
image

  • \mu is default initialized to 0
  • \sigma is default initialized to 1
  • \gamma is default initialized to 1
  • \beta is default initialized to 0

As you initialize w_0 = 1 and w_1 = 0, two normalization should have same start point. Will be very interested to see how two parameterization's weight varies during the time.

CQFIO commented

I also try an experiment with adaptive normalization where w_0 = 0 and w_1 = 1. The performance is not good with MSE at 37 for ROF. So the initialization matters a lot.

On the other hand, batch normalization may be not suitable for identity mapping as \sigma and \beta are always changing during training. To achieve perfect identity mapping, we need \gamma=\sigma and \mu=\beta but \sigma and \beta are changing. It seems hard to keep \gamma=\sigma and \mu=\beta closely by gradient descent.

mzh0 commented

I also try an experiment with adaptive normalization where w_0 = 0 and w_1 = 0. The performance is not good with MSE at 37 for ROF. So the initialization matters a lot.

Do you mean w_0 = 0 and w_1 = 1?

CQFIO commented

Right, I mean w_0 = 0 and w_1 = 1. Sorry for the typo.