Awesome-BEV-Papers

curated list of BEV related papers. I also organized DETR related papers here, as they are also closely related to most recent papers.

I am intensely reading BEV related papers these days, so this list is expected to be updated very frequently.

TOC

BEV 3D Object Detection related

  • [AutoAlignV2] AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection.
    Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, Feng Zhao.
    In . [2207.10316] [zehuichen123/AutoAlignV2]
  • [ORA3D] ORA3D: Overlap Region Aware Multi-view 3D Object Detection.
    Wonseok Roh, Gyusam Chang, Seokha Moon, Giljoo Nam, Chanyoung Kim, Younghyun Kim, Sangpil Kim, Jinkyu Kim.
    In . [2207.00865]
  • [PolarFormer] PolarFormer: Multi-camera 3D Object Detection with Polar Transformers.
    Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang.
    In . [2206.15398]
  • [SRCN3D] SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving.
    Yining Shi, Jingyan Shen, Yifan Sun, Yunlong Wang, Jiaxin Li, Shiqi Sun, Kun Jiang, Diange Yang.
    In . [2206.14451] [synsin0/SRCN3D]
  • [PolarDETR] Polar Parametrization for Vision-based Surround-View 3D Detection.
    Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Chang Huang, Wenyu Liu.
    In . [2206.10965] [hustvl/PolarDETR]
  • [BEVDepth] BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection.
    Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li.
    In . [2206.10092] [Megvii-BaseDetection/BEVDepth]
  • [Ego3RT] Learning Ego 3D Representation as Ray Tracing.
    Jiachen Lu, Zheyuan Zhou, Xiatian Zhu, Hang Xu, Li Zhang.
    In . [2206.04042] [fudan-zvg/Ego3RT]
  • [PETRv2] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.
    Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Qi Gao, Tiancai Wang, Xiangyu Zhang, Jian Sun.
    In . [2206.01256] [megvii-research/PETR]
  • [UVTR] Unifying Voxel-based Representation with Transformer for 3D Object Detection
    Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia.
    In . [2206.00630] [dvlab-research/UVTR]
  • [BEVFusion2] BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
    Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, Zhi Tang.
    In . [2205.13790] [ADLab-AutoDrive/BEVFusion]
  • [BEVFusion1] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
    Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, Song Han.
    In . [2205.13542] [mit-han-lab/bevfusion]
  • [BEVerse] BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving.
    Yunpeng Zhang, Zheng Zhu, Wenzhao Zheng, Junjie Huang, Guan Huang, Jie Zhou, Jiwen Lu.
    In . [2205.09743] [zhangyp15/BEVerse]
  • [MUTR3D] MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries.
    Tianyuan Zhang, Xuanyao Chen, Yue Wang, Yilun Wang, Hang Zhao.
    In CVPRW 2022. [2205.00613] [a1600012888/MUTR3D]
  • [Graph-DETR3D] Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection
    Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, Feng Zhao.
    In . [2204.11582]
  • [M2BEV] M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation
    Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez.
    In . [2204.05088] [NVlabs/M2BEV]
  • [BEVFormer] BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai.
    In . [2203.17270] [zhiqi-li/BEVFormer]
  • [BEVDet4D] BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection.
    Junjie Huang, Guan Huang.
    In . [2203.17054] [HuangJunJie2017/BEVDet]
  • [PETR] PETR: Position Embedding Transformation for Multi-View 3D Object Detection
    Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun.
    In . [2203.05625] [megvii-research/PETR]
  • [BEVDet] BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View.
    Junjie Huang, Guan Huang, Zheng Zhu, Dalong Du.
    In . [2112.11790] [HuangJunJie2017/BEVDet]
  • [DETR3D] DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries.
    Yue Wang, Vitor Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, Justin Solomon.
    In CoRL 2021. [2110.06922] [wangyueft/detr3d]

BEV Segmentation related

  • [UniFormer] Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View.
    Zequn Qin, Jingyu Chen, Chao Chen, Xiaozhi Chen, Xi Li.
    In . [2207.08536]
  • [CoBEVT] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers.
    Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma.
    In . [2207.02202]
  • [Simple Baseline] A Simple Baseline for BEV Perception Without LiDAR.
    Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki.
    In . [2206.07959]
  • [GKT] Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer
    Shaoyu Chen, Tianheng Cheng, Xinggang Wang, Wenming Meng, Qian Zhang, Wenyu Liu.
    In . [2206.04584] [hustvl/GKT]
  • [ViT-BEVSeg] ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation.
    Pramit Dutta, Ganesh Sistu, Senthil Yogamani, Edgar Galván, John McDonald.
    In WCCI 2022. [2205.15667]
  • [Cross-view Transformers] Cross-view Transformers for real-time Map-view Semantic Segmentation.
    Brady Zhou, Philipp Krähenbühl.
    In CVPR 2022. [2205.02833] [bradyz/cross_view_transformers]
  • [GitNet] GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation.
    Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai.
    In . [2204.07733]
  • [HFT] HFT: Lifting Perspective Representations via Hybrid Feature Transformation.
    Jiayu Zou, Junrui Xiao, Zheng Zhu, Junjie Huang, Guan Huang, Dalong Du, Xingang Wang.
    In . [2204.05068] [JiayuZou2020/HFT]
  • [PersFormer] PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark.
    Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan.
    In . [2203.11089] [OpenPerceptionX/PersFormer_3DLane]
  • [BEVSegFormer] BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs.
    Lang Peng, Zhirong Chen, Zhangjie Fu, Pengpeng Liang, Erkang Cheng.
    In . [2203.04050]
  • [STSU] Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images.
    Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool
    In ICCV 2021. [2110.01997] [ybarancan/STSU]
  • [TIM] Translating Images into Maps.
    Avishkar Saha, Oscar Mendez Maldonado, Chris Russell, Richard Bowden
    In ICRA 2022. [2110.00966] [avishkarsaha/translating-images-into-maps]
  • [NEAT] NEAT: Neural Attention Fields for End-to-End Autonomous Driving.
    Kashyap Chitta, Aditya Prakash, Andreas Geiger.
    In ICCV 2021. [2109.04456]
  • [BEV Panoptic] Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images
    Nikhil Gosala, Abhinav Valada.
    In RA-L 2021. [2108.03227] [code]
  • [Disentangling and Vectorization] Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras.
    Zizhang Wu, Wenkai Zhang, Jizheng Wang, Man Wang, Yuanzhu Gan, Xinchao Gou, Muqing Fang, Jing Song.
    In IROS 2021. [2107.08862]
  • [HDMapNet] HDMapNet: An Online HD Map Construction and Evaluation Framework
    Qi Li, Yue Wang, Yilun Wang, Hang Zhao.
    In ICRA 2022. [2107.06307] [Tsinghua-MARS-Lab/HDMapNet]
  • [FIERY] FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras
    Anthony Hu, Zak Murez, Nikhil Mohan, Sofia Dudas, Jeffrey Hawke, Vijay Badrinarayanan, Roberto Cipolla, Alex Kendall.
    In ICCV 2021. [2104.10490] [wayveai/fiery]
  • [STA] Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation
    Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden. In ICRA 2021. [paper]
  • [EPOSH] Bird’s Eye View Segmentation Using Lifted 2D Semantic Features.
    Isht Dwivedi, Srikanth Malla, Yi-Ting Chen, Behzad Dariush.
    In BMVC 2021. [paper]
  • [PYVA] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation.
    Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan.
    In CVPR 2021. [paper] [JonDoe-297/cross-view]
  • [BEV feat stitch] Understanding Bird's-Eye View of Road Semantics using an Onboard Camera
    Yigit Baran Can, Alexander Liniger, Ozan Unal, Danda Paudel, Luc Van Gool.
    In RA-L 2021. [2012.03040] [ybarancan/BEV_feat_stitch]
  • [LSS] Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D.
    Jonah Philion, Sanja Fidler.
    In ECCV 2020. [2008.05711] [nv-tlabs/lift-splat-shoot]
  • [BEV-Seg] BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud
    Mong H. Ng, Kaahan Radia, Jianfei Chen, Dequan Wang, Ionel Gog, Joseph E. Gonzalez.
    In CVPRW 2020. [2006.11436]
  • [Cam2BEV] A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View.
    Lennart Reiher, Bastian Lampe, Lutz Eckstein.
    In ITSC 2020. [2005.04078] [ika-rwth-aachen/Cam2BEV]
  • [PyrOccNet] Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks
    Thomas Roddick, Roberto Cipolla.
    In . [2003.13402] [tom-roddick/mono-semantic-maps]
  • [MonoLayout] MonoLayout: Amodal scene layout from a single image
    Kaustubh Mani, Swapnil Daga, Shubhika Garg, N. Sai Shankar, Krishna Murthy Jatavallabhula, K. Madhava Krishna.
    In WACV 2020. [2002.08394] [hbutsuak95/monolayout]
  • [VPN] Cross-view Semantic Segmentation for Sensing Surroundings
    Bowen Pan, Jiankai Sun, Ho Yin Tiga Leung, Alex Andonian, Bolei Zhou.
    In RA-L 2020. [1906.03560] [pbw-Berwin/View-Parsing-Network]
  • [OFT] Orthographic Feature Transform for Monocular 3D Object Detection
    Thomas Roddick, Alex Kendall, Roberto Cipolla.
    In BMVC 2019. [1811.08188] [tom-roddick/oft]
  • [VED] Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks.
    Chenyang Lu, Marinus Jacobus Gerardus van de Molengraft, Gijs Dubbelman.
    In RA-L 2019. [1804.02176]
  • Learning to Look around Objects for Top-View Representations of Outdoor Scenes
    Samuel Schulter, Menghua Zhai, Nathan Jacobs, Manmohan Chandraker.
    In ECCV 2018. [1803.10870]
  • [MapNet] MapNet: An Allocentric Spatial Memory for Mapping Environments.
    Joao F. Henriques Andrea Vedaldi.
    In CVPR 2018. [paper]
  • [Mapping] Automatic Dense Visual Semantic Mapping from Street-Level Imagery.
    Sunando Sengupta, Paul Sturgess, L’ubor Ladický, Philip H. S. Torr.
    In IROS 2012. [paper]

DETR Series

  • [Group DETR] Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment.
    Qiang Chen, Xiaokang Chen, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang.
    In . [2207.13085]
  • [H-DETR] DETRs with Hybrid Matching.
    Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, Weihong Lin, Lei Sun, Chao Zhang, Han Hu.
    In . [2207.13080] [HDETR/H-Deformable-DETR]
  • [DETR++] DETR++: Taming Your Multi-Scale Detection Transformer.
    Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, Jindong Chen.
    In CVPRW 2022. [2206.02977]
  • [Mask DINO] Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
    Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum.
    In . [2206.02777] [IDEACVR/MaskDINO]
  • [DDQ] What Are Expected Queries in End-to-End Object Detection?
    Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Kai Chen.
    In . [2206.01232] [jshilong/DDQ]
  • [Dynamic Sparse R-CNN] Dynamic Sparse R-CNN.
    Qinghang Hong, Fengming Liu, Dong Li, Ji Liu, Lu Tian, Yi Shan.
    In CVPR 2022. [2205.02101]
  • [DINO] DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
    In . [2203.03605]
  • [DN-DETR] DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
    Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
    In CVPR 2022. [2203.01305] [IDEA-opensource/DN-DETR]
  • [D^2ETR] D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention.
    Junyu Lin, Xiaofeng Mao, Yuefeng Chen, Lei Xu, Yuan He, Hui Xue.
    In . [2203.00860]
  • [DAB-DETR] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.
    Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang.
    In ICLR 2022. [2201.12329] [IDEA-opensource/DAB-DETR]
  • [Deformable Attention] Vision Transformer with Deformable Attention.
    Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang.
    In CVPR 2022. [2201.00520] [LeapLabTHU/DAT]
  • [Sparse DETR] Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
    Byungseok Roh, JaeWoong Shin, Wuhyun Shin, Saehoon Kim.
    In ICLR 2022. [2111.14330] [kakaobrain/sparse-detr]
  • [Anchor DETR] Anchor DETR: Query Design for Transformer-Based Object Detection.
    Yingming Wang, Xiangyu Zhang, Tong Yang, Jian Sun.
    In AAAI 2022. [2109.07107] [megvii-research/AnchorDETR]
  • [Dynamic DETR] Dynamic DETR: End-to-End Object Detection With Dynamic Attention.
    Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, Lei Zhang.
    In ICCV 2021. [paper]
  • [Conditional DETR] Conditional DETR for Fast Training Convergence
    Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
    In ICCV 2021. [2108.06152] [Atten4Vis/ConditionalDETR]
  • [Efficient DETR] Efficient DETR: Improving End-to-End Object Detector with Dense Prior
    Zhuyu Yao, Jiangbo Ai, Boxun Li, Chi Zhang.
    In . [2104.01318]
  • [SMCA] Fast Convergence of DETR with Spatially Modulated Co-Attention
    Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li.
    In . [2101.07448] [gaopengcuhk/SMCA-DETR]
  • [Sparse R-CNN] Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
    Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo.
    In . [2011.12450] [PeizeSun/SparseR-CNN]
  • [TSP] Rethinking Transformer-based Set Prediction for Object Detection
    Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani.
    In ICCV 2021. [2011.10881]
  • [Deformable DETR] Deformable DETR: Deformable Transformers for End-to-End Object Detection.
    Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.
    In ICLR 2021. [2010.04159] [fundamentalvision/Deformable-DETR]
  • [DETR] End-to-End Object Detection with Transformers.
    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
    In [2005.12872] [facebookresearch/detr]