About the batchnorm in CoordAttention

Question

About the batchnorm in CoordAttention

hello-trouble opened this issue 3 years ago · 1 comments

Hello, Thank you for your excellent job about the attention. I am a little puzzled about the code. compared to the Senet, there is a batchnorm operation in the CoordAttention. Is it necessary for the attention mechanism? In addition, Is it necessary that I replace the ReLU operation (the self.relu(x + 3) / 6 ) with the ordinary ReLU, when the input are normalized between -1 and 1 .

Answer 1 · 2021-08-15T02:38:57.000Z

In mobile network training, it would be better to use ReLU6 or Swich, which is smooth. MobileNetV3 has demonstrated this.