TPVFormer's proposed CVHA which is in the paper is not implemented in the code
sathiiii opened this issue · 0 comments
Hi, Firstly, I would like to express my appreciation for the impressive work you have presented in your recent paper on TPVFormer. The concept of utilizing a tri-perspective view (TPV) representation and the proposed CVHA (Cross-View Hybrid Attention) mechanism for information exchange between different views are both novel and intriguing.
After carefully examining the code implementation provided in the TPVFormer repository, I noticed that the CVHA mechanism, as described in the paper, is not fully implemented (this was also asked in #29). The code only includes the self-attention mechanism on the HW plane but does not incorporate the cross-view hybrid attention (TPV self-attention) as outlined in the paper. I would like to kindly inquire about the following questions (different from #29):
- Have you tried implementing the CVHA? (If you did, why didn't you include it in the code even as a commented-out part so that people could test it?)
- Doesn't CVHA make any difference in terms of performance?
- Or else is it GPU memory consumption reasons that you decided to avoid CVHA?
Thanks!