About attention module
1173206772 opened this issue · 4 comments
Thanks for your great work.
You design the attention module ,in this module, it gets the the attention vector Va and then you simpliy do a Hadamard product with the feature map to get a attention map Ma.Right?
I don't understand the difference between the Va obtained in this way and the support protocal vector , in other words, can I easily use the protocal vector instead of your attention vector Va to get the attention map by Hadamard product with the feature map F^23?
Can you explain it to me?
Best wishes.
I appreciate your interest in our work.
We include an attention module to instruct the model to focus more on the class-relevant information. To find the attention feature(only class relevant) we first find Va(masked support feature following Conv layer to get same resolution representing only class-relevant information). We map the produced Va on F^23 to get the final attention feature. In the prototype vector, the masked support feature is squeezed to one dimension using MAP operation. I hope that answers your question
You can replace Va with the support prototype vector(make it first to same resolution) to find the attention feature and check the performance difference.
Sorry, I didn't notice your reply before.
You said Va(masked support feature following Conv layer to get same resolution representing only class-relevant information),but by the code att = F.adaptive_avg_pool2d(self.mask(Fs, Ys), output_size=(1, 1))
we will get the tensor with shape [B,C,1,1],right? In this way , how can I get the same resolution representing only class-relevant information?
Thank you for the clarification. Yes, you are right the average pooling output is (1,1), which produces the same resolution of attention feature after multiplying with F^23. following forward function.
Line 70 in c19e01c
You can try to use prototype vector and can check the performance difference. Thank you
OK, thanks for your reply.