Dootmaan/MT-UNet

请教一个问题

stdcoutzrh opened this issue · 2 comments

作者您好,感谢您在本篇论文里的工作。
但是我看了论文和代码有一些疑惑,就是论文里的External Attention,代码里这部分也是对当前批次的特征x进行一些nn.Linear操作:
1.请问论文里说的该模块可以跨样本学习,是指的当前batch的样本吗?
2.是怎么体现或者是实现跨样本的呢?
期待您的回复,谢谢。

hi @stdcoutzrh thank you for your question. you can refer to the original paper of EA for detailed explanation (which is highly recommended). For short, SA mainly learns the self-affinity with the SELF-attention matrix while EA uses a CROSS SAMPLE matrix for optimization.

However it also should be noted that this is more like a fancy theoretical explanation for EA since the authors of EA admit that the implemementation is just too simple (not so different from original feed-forward layer) and the reason for EA performing so well is mainly because of the softmax norm layer. actually inter-sample learning is also introduced in the original design of transformers (kqv mapping and FFN etc) and we want to use EA to emphasis such operation.

This issue is closed since no further activity has happened for a while.