wzzheng/TPVFormer

TPVFormer's proposed CVHA which is in the paper is not implemented in the code

sathiiii opened this issue · 0 comments

Hi, Firstly, I would like to express my appreciation for the impressive work you have presented in your recent paper on TPVFormer. The concept of utilizing a tri-perspective view (TPV) representation and the proposed CVHA (Cross-View Hybrid Attention) mechanism for information exchange between different views are both novel and intriguing.

After carefully examining the code implementation provided in the TPVFormer repository, I noticed that the CVHA mechanism, as described in the paper, is not fully implemented (this was also asked in #29). The code only includes the self-attention mechanism on the HW plane but does not incorporate the cross-view hybrid attention (TPV self-attention) as outlined in the paper. I would like to kindly inquire about the following questions (different from #29):

  1. Have you tried implementing the CVHA? (If you did, why didn't you include it in the code even as a commented-out part so that people could test it?)
  2. Doesn't CVHA make any difference in terms of performance?
  3. Or else is it GPU memory consumption reasons that you decided to avoid CVHA?

Thanks!