It seems that the implementation of Channel only self attention and Spatial only self attention change each other.
Opened this issue · 5 comments
Thanks for sharing your work!
PSA/semantic-segmentation/network/PSA.py
Lines 64 to 95 in 588b370
It seems that spatial_pool function is the same with Channel-only self attention module.
Can you explain more what do you mean?
Does spatial pool function mean Channel-only Self-Attention?
Does channel pool function mean Spatail-only Self-Attention?
OK, I see it now:
The spatial_pool
function should be renamed to channel_pool
and the channel_pool
function should be renamed to spatial_pool
.
I have found another discrepancy too:
In the channel_pool
function (which should be renamed to spatial_pool
), softmax
is called after matmul
. But in the paper, in the Spatial-only Self-attention block, softmax
is used before matmul
.
@khoshsirat
You are right
The location of softmax operation in channel_pool function is different with paper explanation.
What's going on?
Which one is correct?
Hi guys, I have created a gist to compare this implementation against External-Attention-pytorch's. Through the simple test case, I found that the outputs are different with kaiming init.
Any idea why?