I did't get the same accuracy?

Question

I did't get the same accuracy?

Closed this issue 6 years ago · 1 comments

I read the paper " Very Deep Convolutional Neural Network Based Image Classification Using
Small Training Sample Size" carefully and found the following difference between the code and the original paper?

We add Batch Normalization layer before every nonlinearity. (that's mean before "Relu activation function ") but here it after it ?
The momentum, base learning rate and base weight decay rate are set to be 0.9, 0.001, 0.006 (but here the learning rate starts from 0.1 ? )
"mini-batch SGD" is used by the original paper, but here it's SGD .

could you please explain it me ?

Answer 1 · 2018-05-12T16:36:50.000Z

As i wrote only the architecture is based (or maybe better to say inspired) from the paper, the optimization and training procedure are not.

For the location of RuLu with respect to the batchnorm, both works. there is not consunsus whether to put it before or after.
As i said these are the parameters that i found working good for this.
It is a mini-batch SGD, read the code.