/awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)

Awesome Detection Transformer Awesome

This a collecttion of papers for detection and segmentation with Transformer . We reorginize the repo by reserach fields.
If you find some overlooked papers or resourses, please open issues or pull requests (recommended).

Table of Contents

Papers

DETR

[DETR] End-to-End Object Detection with Transformers.
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
ECCV 2020. [paper] [code]

Object Detection

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
arxiv 2022. [paper] [code]

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
CVPR 2022. [paper] [code]

Accelerating DETR Convergence via Semantic-Aligned Matching.
Gongjie Zhang,Zhipeng Luo,Yingchen Yu,Kaiwen Cui,Shijian Lu.
CVPR 2022. [paper] [code]

DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson.
CVPR 2022. [paper] [code]

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.
Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang.
ICLR 2022. [paper] [code]

ViDT: An Efficient and Effective Fully Transformer-based Object Detector.
Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang.
ICLR 2022. [paper] [code]

CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection.
Xipeng Cao, Peng Yuan, Bailan Feng, Kun Niu.
AAAI 2022. [paper]

FP-DETR: Detection Transformer Advanced by Fully Pre-training.
Wen Wang, Yang Cao, Jing Zhang, Dacheng Tao.
ICLR 2022. [paper]

D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention.
Junyu Lin, Xiaofeng Mao, Yuefeng Chen, Lei Xu, Yuan He, Hui Xue
arxiv 2022. [paper] [code]

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity.
Byungseok Roh, JaeWoong Shin, Wuhyun Shin, Saehoon Kim.
ICLR 2022. [paper] [code]

Anchor DETR: Query Design for Transformer-Based Object Detection.
Yingming Wang, Xiangyu Zhang, Tong Yang, Jian Sun.
AAAI 2022. [paper] [code]

[YOLOS] You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection.
Yuxin Fang*, Bencheng Liao*, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
NeurIPS 2021. [paper] [code]

Dynamic DETR: End-to-End Object Detection With Dynamic Attention.
Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, Lei Zhang.
ICCV 2021. [paper]

PnP-DETR: Towards Efficient Visual Analysis with Transformers.
Tao Wang, Li Yuan, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
ICCV 2021. [paper] [code]

WB-DETR: Transformer-Based Detector without Backbone.
Fanfan Liu, Haoran Wei, Wenzhe Zhao, Guozhen Li, Jingquan Peng, Zihao Li.
ICCV 2021. [paper]

Conditional DETR for Fast Training Convergence.
Depu Meng*, Xiaokang Chen*, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
ICCV 2021. [paper] [code]

Rethinking Transformer-based Set Prediction for Object Detection.
Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani.
ICCV 2021. [paper] [code]

Fast Convergence of DETR with Spatially Modulated Co-Attention.
Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li .
ICCV 2021. [paper] [code]

Efficient DETR: Improving End-to-End Object Detector with Dense Prior.
Zhuyu Yao, Jiangbo Ai, Boxun Li, Chi Zhang.
arxiv 2021. [paper]

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers.
Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen.
CVPR 2021. [paper] [code]

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.
ICLR 2021. [paper] [code]

Open Vocabulary Objection Detection

OW-DETR: Open-world Detection Transformer.
Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah.
CVPR 2022. [paper] [code]

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding.
Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, Nicolas Carion.
ICCV 2021. [paper] [code]

3D Object Detection

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers.
Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, Chiew-Lan Tai.
CVPR 2022. [paper] [code]

Omni-DETR: Omni-Supervised Object Detection with Transformers.
Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto.
CVPR 2022. [paper]

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
Renrui Zhang, Han Qiu, Tai Wang, Xuanzhuo Xu, Ziyu Guo, Yu Qiao, Peng Gao, Hongsheng Li.
CVPR 2022. [paper] [code]

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer.
Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu.
CVPR 2022. [paper] [code]

[VoxSeT] Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds.
Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang.
CVPR 2022. [paper] [code]

[SST] Embracing Single Stride 3D Object Detector with Sparse Transformer.
Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang.
CVPR 2022. [paper] [code]

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries.
Yue Wang, Vitor Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, Justin Solomon.
CORL 2021. [paper] [code]

[VOTR] Voxel Transformer for 3D object detection.
Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu.
ICCV 2021. [paper] [code]

[SRDet] Suppress-and-Refine Framework for End-to-End 3D Object Detection.
Zili Liu, Guodong Xu, Honghui Yang, Minghao Chen, Kuoliang Wu, Zheng Yang, Haifeng Liu, Deng Cai.
arxiv 2021. [paper] [code]

[3DETR] An End-to-End Transformer Model for 3D Object Detection.
Ishan Misra, Rohit Girdhar, Armand Joulin.
ICCV 2021. [paper] [code]

[GroupFree3D] Group-Free 3D Object Detection via Transformers.
Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.
ICCV 2021. [paper] [code]

Segmentation

[Mask2Former] Masked-attention Mask Transformer for Universal Image Segmentation .
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar.
arxiv 2021. [paper] [code]

[MaskFormer] Per-Pixel Classification is Not All You Need for Semantic Segmentation.
Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
NeurIPS 2021. [paper] [code]

Benchmarks

COCO Detection on Paperswithcode.

Semantic Segmentation on Paperswithcode.

3D Object Detection on Paperswithcode.

Acknowledgements

We thank all the authors above for their great works!