The computational cost of deformabel attention
linjing7 opened this issue · 2 comments
linjing7 commented
Hi, thanks for your excellent work.
I notice that the number of the sampled keys/values is the same as the querys. Therefore, the computational cost of deformable attention is the same as global attention, is it right? So I'm curious why don't you use a global self-attention at the last two stages?
Vladimir2506 commented
Thanks for your question. It's right. Please refer to Table 6 in the ablation study section in the paper. In fact, the settings in the paper may not be optimal, and there might be some better results as we conduct more detailed investigations these days.
linjing7 commented
Okay, thank you very much.