mlvlab/SPoTr

Question about equ(9)

Rurouni-z opened this issue · 2 comments

$$\mathbb{A}_{q, k, c}=\frac{\exp \left(\mathcal{M}^{\prime}\left(\left[\mathcal{R}^{\prime}\left(\mathbf{f}_{q}, \mathbf{f}_{k}\right) ; \phi_{q k}\right] / \tau\right)_{c}\right)}{\sum_{k^{\prime} \in \Omega_{k e y}} \exp \left(\mathcal{M}^{\prime}\left(\left[\mathcal{R}^{\prime}\left(\mathbf{f}_{q}, \mathbf{f}_{k^{\prime}}\right) ; \phi_{q k^{\prime}}\right] / \tau\right)_{c}\right)}$$

where i can find below in your code.

$$\sum_{k^{\prime} \in \Omega_{k e y}} \exp \left(\mathcal{M}^{\prime}\left(\left[\mathcal{R}^{\prime}\left(\mathbf{f}_{q}, \mathbf{f}_{k^{\prime}}\right) ; \phi_{q k^{\prime}}\right] / \tau\right)_{c}\right)$$

and why your softmax attn the last dim, then it is channel attention. i am a newbie, hope you can help me, thank you very much! :)

i get it: softmax(x)_i = exp(x_i) / sum(exp(x)). thats why cant see sum()

I still don't know why you don't softmax dim but key_points_nums

here is from gpt:

When you perform a softmax on the last dimension of a three-dimensional array (the key_points_nums dimension), you are performing a softmax operation on each vector in each [dim, query_point_nums] slice.

PJin0 commented

Since we'd like to softly select the channel of each point, we perform softmax regarding to key_points_num.