这篇论文提到的block是不是用到了involution

Question

这篇论文提到的block是不是用到了involution

JasonLeeFdu opened this issue 3 years ago · 8 comments

JasonLeeFdu commented 3 years ago

https://arxiv.org/abs/2103.06255

Answer 1 · 2021-08-09T01:55:00.000Z

论文公式5 以及代码中的LocalConvolution aggregation_zeropad方法，是不是借鉴到了involution?有什么不一样的呢？

Answer 2 · 2021-08-09T01:55:52.000Z

还有，我想问一下，代码中的cotlayer是本文的贡献，那么 CoXtLayer是做什么的呢？

Answer 3 · 2021-08-09T02:00:41.000Z

can be answered in english

Answer 4 · 2021-08-09T09:04:20.000Z

Involution shares similar spirit with the paper: Pay less attention with lightweight and dynamic convolutions

There are two main difference between CoTNet and Involution: 1. CoTNet mines the static context among keys via a 3×3 convolution. 2. CoTNet performs self-attention based on the query and contextualized key, while Involution directly generates the kernel by 1x1 convolution.

CoXtLayer is similar to CotLayer, which has higher dimension with two groups.

Answer 5 · 2021-08-11T02:57:13.000Z

I confused about the function mentioned in the paper:
the output channels of the second 1x1 convolution are defined by kxkxCh.
In the paper, you explained: Ch is the number of heads, and kxk is the local grid in space.
Can I understand that like this: In a transformer block we usually defined a hyper-parameter head (Ch), and then we reshape the output channels into (Ch, kxk)?
Another question is you used the LocalConvolution, I do not know why ?
Can you explain, thank you.

Answer 6 · 2021-08-14T10:41:53.000Z

In CoT block, we reshape the output channels into (Ch, kxk).

LocalConvolution is used for aggregating all values within each k × k grid with the learnt local attention matrix in equation 3. Section 3.4 discusses the connections between self-attention and dynamic region-aware convolution.

Answer 7 · 2021-08-19T04:56:14.000Z

In CoT block, we reshape the output channels into (Ch, kxk).

LocalConvolution is used for aggregating all values within each k × k grid with the learnt local attention matrix in equation 3. Section 3.4 discusses the connections between self-attention and dynamic region-aware convolution.

您好，可以比较详细的解释一下LocalConvolution吗？

Answer 8 · 2021-08-20T02:28:41.000Z

好的，谢谢，明白了