Something wrong in Eq. (7) in the manuscript
Closed this issue · 7 comments
Hi,
Thanks for your great work!
May I ask if there is something wrong in the first term of Eq. (7) in the manuscript as there are duplicated WV without keys (K).
Thanks.
Hi @rshaojimmy,
Thanks for the catch. There is a typo in Eq. (7).
The first WV should be WK, as an attention takes in a Query and a Key.
We will update the arxiv version.
Best,
Raymond
Got it. Thanks!
May I further ask what is the second term in Eq. (7) for? The first term is the self-attention conduced within the region r. Why should add one more second term compared to normal self-attention?
Thanks.
Recall, we defined \mathcal{R} to be a set of patch indices, i.e., it does not contain the region token r(l).
In a normal self-attention, each token also computes an attention with itself. Hence, we needed the second term.
Side Note: We could have defined the set \mathcal{R} to also include the region token then we will only have the first term. However, this requires a single notation for both patch (f) and region token (r), which we thought might confuse the reader.
Thanks! But it seems that this paper did not explicitly mention that \mathcal{R} does not contain the region token r(l) in the manuscript.
In the paper, "\mathcal{R} denotes a set of patch indices covered by the region".
As a region token does not correspond to a patch, it is not included in \mathcal{R}. We can make this more explicit.
Thanks for pointing this out.
I see. Thanks so much.