LeapLabTHU/DAT

what's the difference between DAT and deformable DETR's attention?

lucasjinreal opened this issue · 3 comments

what's the difference between DAT and deformable DETR's attention?

@jinfagang Please check Part A. in the Appendix of our paper.

@Vladimir2506 thanks for point it out. I see that DAT's code didn't using any c++ customized layer or op while deformableDETR using customized code to implement their deformable, why ?

@jinfagang Because the spatial sampling operation in DAT is relatively simple and can be directly implemented by F.gridsample(feature, pos) with a feasible speed, while Deformable-DETR provides a more optimized CUDA OP version for different numbers of keys. Therefore, there could be an optimized CUDA implementation for DAT if low latency is in demand.