DeLightCMU/PSA

It seems that the implementation of Channel only self attention and Spatial only self attention change each other.

Opened this issue · 5 comments

Thanks for sharing your work!

def spatial_pool(self, x):
input_x = self.conv_v_right(x)
batch, channel, height, width = input_x.size()
# [N, IC, H*W]
input_x = input_x.view(batch, channel, height * width)
# [N, 1, H, W]
context_mask = self.conv_q_right(x)
# [N, 1, H*W]
context_mask = context_mask.view(batch, 1, height * width)
# [N, 1, H*W]
context_mask = self.softmax_right(context_mask)
# [N, IC, 1]
# context = torch.einsum('ndw,new->nde', input_x, context_mask)
context = torch.matmul(input_x, context_mask.transpose(1,2))
# [N, IC, 1, 1]
context = context.unsqueeze(-1)
# [N, OC, 1, 1]
context = self.conv_up(context)
# [N, OC, 1, 1]
mask_ch = self.sigmoid(context)
out = x * mask_ch
return out

image

It seems that spatial_pool function is the same with Channel-only self attention module.

Can you explain more what do you mean?

@khoshsirat

Does spatial pool function mean Channel-only Self-Attention?

Does channel pool function mean Spatail-only Self-Attention?

OK, I see it now:
The spatial_pool function should be renamed to channel_pool and the channel_pool function should be renamed to spatial_pool.

I have found another discrepancy too:
In the channel_pool function (which should be renamed to spatial_pool), softmax is called after matmul. But in the paper, in the Spatial-only Self-attention block, softmax is used before matmul.

@khoshsirat
You are right
The location of softmax operation in channel_pool function is different with paper explanation.
What's going on?
Which one is correct?

Hi guys, I have created a gist to compare this implementation against External-Attention-pytorch's. Through the simple test case, I found that the outputs are different with kaiming init.

Any idea why?