Why there is no activation function in attention module?
RicoSuaveGuapo opened this issue · 1 comments
Thanks for your excellent work, I have a quick question about the model structure.
In
SegNeXt/mmseg/models/backbones/mscan.py
Lines 59 to 91 in b53d601
we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.
I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.
SegNeXt/mmseg/models/backbones/mscan.py
Lines 94 to 101 in b53d601
But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.
Good question.
I have asked myself same question and tried to merge them into a 21 x 21 matrix. However, i can not merge them into a 21 x 1 and a 1 x 21 matrix.
Actually, merging them into a 21 x 21 matrix is more expensive than current version.