Query - Implementaion of weighted feature aggregation
Closed this issue · 2 comments
Hi,
I am trying to understand the implementation of your equation 5, 6 and 7 in your paper, and would be thankful if you can help.
According to your paper, the support feature fS is flipped before being used to convolve the R. However, there is a 1x1
convolution (line 247 below) applied on the support feature fS before implementing the equation 5. Are you using the projected support feature fS for the equation 5?
Lines 238 to 262 in de067f9
Additionally, there is another 1x1
convolution (line 261) after implementing the equation 6. Are you using the projected fR to implement your equation 7?
Yes, you are right.
This implementation is modified according to nn.MultiheadAttention()
.
Since we want to implement this module using a multi-head form, we need to project a feature (C x H x W
) to N
sub-features (N x C / N x H x W
) as N
heads. Then we implement weighted feature aggregation within each head, finally aggregate the features of all heads together.
Thanks for your help. It makes perfect sense now.