Implementation of pre-training

Question

Implementation of pre-training

physics2014 opened this issue 6 years ago · 9 comments

physics2014 commented 6 years ago

In the implementation of pre-training, you first trained a real-valued resnet to initialize the bnn with the same hyperparameter settings of the original resnet. The architecture of the Bi-Real-net and the standard ResNet is different, which one do you use for pre-training?

If you use the architecture of the standard resnet, is it efficient to load the pre-trained weights into the Bi-Real-net which has different inference graphs?

If you use the architecture of the standard the Bi-Real-net, does it work if you use the same hyperparameter settings of the original resnet because the pooling, batchnorm and activation layers are stacked differently in Binary CNN and CNN.

Answer 1 · 2019-01-31T16:12:00.000Z

I am wondering the same thing. Have you done any experimetns @physics2014 ?

Answer 2 · 2019-03-06T13:03:54.000Z

Also curious to this. For 18-layer resnet, if I use the same architecture and hyperparameters as in 18-layer/Bi-Real-net-18-solver.prototxt for full-precision pretraining, I get to ~49% top-1 accuracy, so significantly below the reported result. Anyone had more success?

Answer 3 · 2019-03-06T13:10:44.000Z

@koenhelwegen Is the ~49% top-1 got on float model or binary model initialized by float model?

Answer 4 · 2019-03-06T13:52:48.000Z

The binary model initialized by the full precision model.

Answer 5 · 2019-03-06T13:59:12.000Z

I implemented and trained bi-real net 18 in pytorch, and got the ~56% top-1 even without the float precision model, but I haven't try this repo.

…

On Wed, Mar 6, 2019, 9:52 PM koenhelwegen ***@***.***> wrote: The binary model initialized by the full precision model. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALEcn4_CzQMxOf6SVAOToZ5kOtIxnqiGks5vT8gwgaJpZM4Y1gqT> .

Answer 6 · 2019-03-06T14:04:37.000Z

Interesting! What learning rate schedule did you use? Would you mind sharing the code?

Answer 7 · 2019-03-06T14:39:50.000Z

You can mail me ;)

…

On Wed, Mar 6, 2019, 10:32 PM koenhelwegen ***@***.***> wrote: Interesting! What learning rate schedule did you use? Would you mind sharing the code? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALEcn58pEI6Y2dMz-bebi6oCdtofm8aeks5vT9FjgaJpZM4Y1gqT> .

Answer 8 · 2019-03-19T01:40:43.000Z

@daquexian Would you mind sharing the code with me, too?
ps: I have mailed you, but maybe you were too busy to notice it :-D

Answer 9 · 2019-03-25T02:29:43.000Z

To pre-training the Bi-Real net 18, we use the same hyperparameter settings as the real-valued resnet. But we use the same architecture as Bi-Real Net 18 binary version, expect the binary-convolution layer is replaced with a convolution layer and the sign function is replaced with the ReLU function.