question about the behavior of bn layers in inference

Question

question about the behavior of bn layers in inference

CoinCheung opened this issue 5 years ago · 4 comments

Hi,

Thanks for bring this great work to the community !! There is something that I find I cannot make clear after reading the paper.

In the paper, you mentioned that in each of the AC branch, there will be a bn layer after the convolution layers (square or asymmetric). In the inference phase, the weights of the conv layers are added up with the asymmetric conv weights added to the skeleton part of the square conv weights. However, I do not understand how the bn layers are fused in the inference. After going through the code, I find that you seems to fuse the bn layers to conv layers, thus there will not be bn layers in the model in the inference phase. If we do not fuse the bn layers to convolution layers, how should we deal with the bn layers in the three branches? Should I simply summed up the weights and running states be make a single bn layer?

Answer 1 · 2019-11-18T10:57:13.000Z

Thank you for your interest in this paper. We must fuse the BN layers then merge the three branches. You can do some maths for verification.

Answer 2 · 2019-11-18T11:50:10.000Z

That would result in a model without bn layers, how could I merge the branches so that I can have conv-bn structures which is exactly the off-shelf model?

Answer 3 · 2019-11-18T11:55:46.000Z

We never need conv-bn structure in deployed models, as an inference-time BN layer is just a linear transformation. Actually, for the deployment of a regular CNN, you should #always# fuse the BN layer into the conv kernel manually or automatically (if your platform could do so) for efficiency.

Answer 4 · 2019-11-19T02:13:41.000Z

Thanks for replying!!! I am closing this:)