what's the difference between DAT and deformable DETR's attention?
lucasjinreal opened this issue · 3 comments
lucasjinreal commented
what's the difference between DAT and deformable DETR's attention?
Vladimir2506 commented
@jinfagang Please check Part A. in the Appendix of our paper.
lucasjinreal commented
@Vladimir2506 thanks for point it out. I see that DAT's code didn't using any c++ customized layer or op while deformableDETR using customized code to implement their deformable, why ?
Vladimir2506 commented
@jinfagang Because the spatial sampling operation in DAT is relatively simple and can be directly implemented by F.gridsample(feature, pos)
with a feasible speed, while Deformable-DETR provides a more optimized CUDA OP version for different numbers of keys. Therefore, there could be an optimized CUDA implementation for DAT if low latency is in demand.