Implementation details of view transformer

Question

Implementation details of view transformer

Zhentao-Liu opened this issue 2 years ago · 3 comments

In the provided code, attn = k - q[:,:,None,:] + pos, attn = self.attn_fc(attn). However, in Fig. 2.a and alg.1, there should not be self.attn_fc component. Could you give an explanation?

Answer 1 · 2023-02-04T14:33:11.000Z

This part code is in transformer_network.py class Attention2D

Answer 2 · 2023-02-05T03:16:14.000Z

In Eq 9, what do you mean by applying diag(.)

Answer 3 · 2023-02-06T15:10:01.000Z

Hi @Zhentao-Liu!

Thank you for pointing it out! Yes, there is an error in our pseudo-code in algorithm. 1 (although fa(.) was defined we never used it). However, our implementation details (in text) do discuss the same (Appendix. B - Memory-efficient Cross-View Attention).