VITA-Group/GNT

Implementation details of view transformer

Zhentao-Liu opened this issue · 3 comments

In the provided code, attn = k - q[:,:,None,:] + pos, attn = self.attn_fc(attn). However, in Fig. 2.a and alg.1, there should not be self.attn_fc component. Could you give an explanation?

This part code is in transformer_network.py class Attention2D

In Eq 9, what do you mean by applying diag(.)

Hi @Zhentao-Liu!

Thank you for pointing it out! Yes, there is an error in our pseudo-code in algorithm. 1 (although fa(.) was defined we never used it). However, our implementation details (in text) do discuss the same (Appendix. B - Memory-efficient Cross-View Attention).