Add batch normalization
aromanro opened this issue · 3 comments
Seems easy to add, but I postpone this for later
Paper for this: Ioffe & Szegedy 20015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
The original article added batch normalization before the activation function. As such, it needed parameters (trainable) to linearly transform the result. Implementing this would require too many changes to the code, so I chose the laziness path: add it after the activation. This way it does not need the linear transformation, that role can be taken by the next layer.
There are some discussions over this topic (example: https://stackoverflow.com/questions/55827660/batchnormalization-implementation-in-keras-tf-backend-before-or-after-activa and here better results are reported for having it after activation: https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md). Although having it before activation has its merits, it's worth trying it after the activation, too.