I did't get the same accuracy?
Closed this issue · 1 comments
scstu commented
I read the paper " Very Deep Convolutional Neural Network Based Image Classification Using
Small Training Sample Size" carefully and found the following difference between the code and the original paper?
-
We add Batch Normalization layer before every nonlinearity. (that's mean before "Relu activation function ") but here it after it ?
-
The momentum, base learning rate and base weight decay rate are set to be 0.9, 0.001, 0.006 (but here the learning rate starts from 0.1 ? )
-
"mini-batch SGD" is used by the original paper, but here it's SGD .
could you please explain it me ?
geifmany commented
As i wrote only the architecture is based (or maybe better to say inspired) from the paper, the optimization and training procedure are not.
- For the location of RuLu with respect to the batchnorm, both works. there is not consunsus whether to put it before or after.
- As i said these are the parameters that i found working good for this.
- It is a mini-batch SGD, read the code.