Implementation issues of the Efficient Self-Attention module
K1t3 opened this issue · 0 comments
Firstly, thank you to all the authors for their impressive work. SegFormer has indeed demonstrated extraordinary performance.
When studying the code carefully, I noticed that you mentioned in the article that the Efficient Self Attention
module reshaped the linear projection
to reduce the number of parameters by Attention
class, but after reading the code multiple times, I reckon that there is only code for the ordinary multi-head self-attention mechanism in the code block, and no code implementation that matches the concept of the Efficient Self Attention
module in the article has been found.
Did I misunderstand or was there a missing part in the code? I hope to receive your answer, which is very important to me. THX!