Why using the global mean BN gamma in the whole network?

Question

Why using the global mean BN gamma in the whole network?

Closed this issue 3 years ago · 2 comments

It seems that the mean of BN gamma in all layers is used as the "mean term" by default. However, I believe that the bn gamma values can vary by a large range among different layers, so I wanna know is it a good choice to use global mean?

Answer 1 · 2021-07-07T09:13:53.000Z

The mean values of different bn layers are not very different. We visualize the bn gamma in the appendix, Figure 6. As you can see, gammas in different bn layers are in similar ranges.

The goal of the Polarization Pruning is to find out which neuron is less important in the whole network. For example, if the gamma of the layer i is very large, and the gamma of layer j is very small, what we want is to prune neurons in the layer j and keep neurons in the layer i. The global mean helps us to achieve this goal. Another choice is using different mean value for each individual bn layer. In that case, the neuron importance is compared inside each layer.

If there is something still not clear, feel free to let me know.

Answer 2 · 2021-07-15T06:46:58.000Z

The issue is closed since it has been no activity for more than a week.