Implementation of pre-training
physics2014 opened this issue · 9 comments
In the implementation of pre-training, you first trained a real-valued resnet to initialize the bnn with the same hyperparameter settings of the original resnet. The architecture of the Bi-Real-net and the standard ResNet is different, which one do you use for pre-training?
If you use the architecture of the standard resnet, is it efficient to load the pre-trained weights into the Bi-Real-net which has different inference graphs?
If you use the architecture of the standard the Bi-Real-net, does it work if you use the same hyperparameter settings of the original resnet because the pooling, batchnorm and activation layers are stacked differently in Binary CNN and CNN.
I am wondering the same thing. Have you done any experimetns @physics2014 ?
Also curious to this. For 18-layer resnet, if I use the same architecture and hyperparameters as in 18-layer/Bi-Real-net-18-solver.prototxt for full-precision pretraining, I get to ~49% top-1 accuracy, so significantly below the reported result. Anyone had more success?
@koenhelwegen Is the ~49% top-1 got on float model or binary model initialized by float model?
The binary model initialized by the full precision model.
Interesting! What learning rate schedule did you use? Would you mind sharing the code?
@daquexian Would you mind sharing the code with me, too?
ps: I have mailed you, but maybe you were too busy to notice it :-D
To pre-training the Bi-Real net 18, we use the same hyperparameter settings as the real-valued resnet. But we use the same architecture as Bi-Real Net 18 binary version, expect the binary-convolution layer is replaced with a convolution layer and the sign function is replaced with the ReLU function.