what's the difference between DAT and deformable DETR's attention?

Question

what's the difference between DAT and deformable DETR's attention?

lucasjinreal opened this issue 2 years ago · 3 comments

Answer 1 · 2022-09-14T13:29:48.000Z

@jinfagang Please check Part A. in the Appendix of our paper.

Answer 2 · 2022-09-14T15:22:09.000Z

@Vladimir2506 thanks for point it out. I see that DAT's code didn't using any c++ customized layer or op while deformableDETR using customized code to implement their deformable, why ?

Answer 3 · 2022-10-02T17:02:21.000Z

@jinfagang Because the spatial sampling operation in DAT is relatively simple and can be directly implemented by F.gridsample(feature, pos) with a feasible speed, while Deformable-DETR provides a more optimized CUDA OP version for different numbers of keys. Therefore, there could be an optimized CUDA implementation for DAT if low latency is in demand.