请教一个问题

Question

请教一个问题

stdcoutzrh opened this issue 2 years ago · 2 comments

作者您好，感谢您在本篇论文里的工作。
但是我看了论文和代码有一些疑惑，就是论文里的External Attention，代码里这部分也是对当前批次的特征x进行一些nn.Linear操作：
1.请问论文里说的该模块可以跨样本学习，是指的当前batch的样本吗？
2.是怎么体现或者是实现跨样本的呢？
期待您的回复，谢谢。

Answer 1 · 2022-05-21T07:20:12.000Z

hi @stdcoutzrh thank you for your question. you can refer to the original paper of EA for detailed explanation (which is highly recommended). For short, SA mainly learns the self-affinity with the SELF-attention matrix while EA uses a CROSS SAMPLE matrix for optimization.

However it also should be noted that this is more like a fancy theoretical explanation for EA since the authors of EA admit that the implemementation is just too simple (not so different from original feed-forward layer) and the reason for EA performing so well is mainly because of the softmax norm layer. actually inter-sample learning is also introduced in the original design of transformers (kqv mapping and FFN etc) and we want to use EA to emphasis such operation.

Answer 2 · 2022-06-07T07:20:16.000Z

This issue is closed since no further activity has happened for a while.