The computational cost of deformabel attention

Question

The computational cost of deformabel attention

linjing7 opened this issue 2 years ago · 2 comments

Hi, thanks for your excellent work.
I notice that the number of the sampled keys/values is the same as the querys. Therefore, the computational cost of deformable attention is the same as global attention, is it right? So I'm curious why don't you use a global self-attention at the last two stages?

Answer 1 · 2022-06-27T02:54:23.000Z

Thanks for your question. It's right. Please refer to Table 6 in the ablation study section in the paper. In fact, the settings in the paper may not be optimal, and there might be some better results as we conduct more detailed investigations these days.

Answer 2 · 2022-06-27T03:29:33.000Z

Okay, thank you very much.