Add batch normalization

Question

Add batch normalization

aromanro opened this issue 2 years ago · 3 comments

Seems easy to add, but I postpone this for later

Answer 1 · 2023-03-27T12:59:41.000Z

Paper for this: Ioffe & Szegedy 20015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

https://arxiv.org/abs/1502.03167

Answer 2 · 2023-06-03T12:53:05.000Z

The original article added batch normalization before the activation function. As such, it needed parameters (trainable) to linearly transform the result. Implementing this would require too many changes to the code, so I chose the laziness path: add it after the activation. This way it does not need the linear transformation, that role can be taken by the next layer.
There are some discussions over this topic (example: https://stackoverflow.com/questions/55827660/batchnormalization-implementation-in-keras-tf-backend-before-or-after-activa and here better results are reported for having it after activation: https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md). Although having it before activation has its merits, it's worth trying it after the activation, too.

Answer 3 · 2023-06-03T19:50:31.000Z

That's it for now. Works quite nicely, I've got 99.3225% accuracy on the test set, the model is saved as 'pretrained' for now: 83b535d
Looks like there is still the possibility to improve more the results, by the looks of the evolution of the loss in train vs validation chart.