LeapLabTHU/DAT

The computational cost of deformabel attention

linjing7 opened this issue · 2 comments

Hi, thanks for your excellent work.
I notice that the number of the sampled keys/values is the same as the querys. Therefore, the computational cost of deformable attention is the same as global attention, is it right? So I'm curious why don't you use a global self-attention at the last two stages?

Thanks for your question. It's right. Please refer to Table 6 in the ablation study section in the paper. In fact, the settings in the paper may not be optimal, and there might be some better results as we conduct more detailed investigations these days.

Okay, thank you very much.