这篇论文提到的block是不是用到了involution
JasonLeeFdu opened this issue · 8 comments
论文公式5 以及代码中的LocalConvolution aggregation_zeropad方法,是不是借鉴到了involution?有什么不一样的呢?
还有,我想问一下,代码中的cotlayer是本文的贡献,那么 CoXtLayer是做什么的呢?
can be answered in english
Involution shares similar spirit with the paper: Pay less attention with lightweight and dynamic convolutions
There are two main difference between CoTNet and Involution: 1. CoTNet mines the static context among keys via a 3×3 convolution. 2. CoTNet performs self-attention based on the query and contextualized key, while Involution directly generates the kernel by 1x1 convolution.
CoXtLayer is similar to CotLayer, which has higher dimension with two groups.
I confused about the function mentioned in the paper:
the output channels of the second 1x1 convolution are defined by kxkxCh.
In the paper, you explained: Ch is the number of heads, and kxk is the local grid in space.
Can I understand that like this: In a transformer block we usually defined a hyper-parameter head (Ch), and then we reshape the output channels into (Ch, kxk)?
Another question is you used the LocalConvolution, I do not know why ?
Can you explain, thank you.
In CoT block, we reshape the output channels into (Ch, kxk).
LocalConvolution is used for aggregating all values within each k × k grid with the learnt local attention matrix in equation 3. Section 3.4 discusses the connections between self-attention and dynamic region-aware convolution.
In CoT block, we reshape the output channels into (Ch, kxk).
LocalConvolution is used for aggregating all values within each k × k grid with the learnt local attention matrix in equation 3. Section 3.4 discusses the connections between self-attention and dynamic region-aware convolution.
您好,可以比较详细的解释一下LocalConvolution吗?
好的,谢谢,明白了