About the batchnorm in CoordAttention
hello-trouble opened this issue · 1 comments
hello-trouble commented
Hello, Thank you for your excellent job about the attention. I am a little puzzled about the code. compared to the Senet, there is a batchnorm operation in the CoordAttention. Is it necessary for the attention mechanism? In addition, Is it necessary that I replace the ReLU operation (the self.relu(x + 3) / 6 ) with the ordinary ReLU, when the input are normalized between -1 and 1 .
houqb commented
In mobile network training, it would be better to use ReLU6 or Swich, which is smooth. MobileNetV3 has demonstrated this.