/BEV-Perception

Bird's Eye View Perception

MIT LicenseMIT

Awesome Bird's Eye View Perception

This is a repository for Bird's Eye View Perception, including 3D object detection, segmentation, online-mapping and occupancy prediction.

News

- 2023.05.09: An initial version of recent papers or projects.
- 2023.05.12: Adding paper for 3D object detection.
- 2023.05.14: Adding paper for BEV segmentation, HD-map construction, Occupancy prediction and motion planning.

Contents

Papers

Survey

  • Vision-Centric BEV Perception: A Survey (Arxiv 2022)[Paper] [Github]
  • Delving into the Devils of Bird’s-eye-viewPerception: A Review, Evaluation and Recipe (Arxiv 2022) [Paper] [Github]

3D Object Detection

Radar Lidar

  • RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection System (Arxiv 2023) [Paper]
  • Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D DynamicObject Detection (CVPR 2023) [paper] [Github]
  • MaskBEV: Joint Object Detection and Footprint Completion for Bird’s-eye View 3D Point Clouds (IORS 2023) [Paper] [Github]
  • LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion (Arxiv 2023) [Paper]

Radar Camera

  • CRAFT: Camera-Radar 3D Object Detectionwith Spatio-Contextual Fusion Transformer (Arxiv 2022) [Paper]
  • RadSegNet: A Reliable Approach to Radar Camera Fusion (Arxiv 2022) [paper]
  • Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection (IEEE TIV 2023) [Paper]
  • CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception (ICLRW 2023) [Paper]
  • RC-BEVFusion: A Plug-In Module for Radar-CameraBird’s Eye View Feature Fusion (Arxiv 2023) [Paper]
  • RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection (CVPR 2024) [Paper] [Github]
  • UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection (Arxiv 2024) [paper]

Lidar Camera

  • Semantic bevfusion: rethink lidar-camera fusion in unified bird’s-eye view representation for 3d object detection (Arxiv 2022) [Paper]
  • Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [Paper]
  • EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection (Arxiv 2023) [Paper] [Github]
  • MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection (CVPR 2023) [paper] [Github]
  • FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration (Arxiv 2023) [Paper]
  • Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection (Arxiv 2023) [paper]
  • SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection (ICCV 2023) [Paper] [Github]
  • 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion (Arxiv 2023) [Paper]
  • FUSIONVIT: HIERARCHICAL 3D OBJECT DETECTION VIA LIDAR-CAMERA VISION TRANSFORMER FUSION (Arxiv 2023) [paper]
  • Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers (Arxiv 2023) [Paper]
  • PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection (Arxiv 2024) [Paper]
  • Learned Multimodal Compression for Autonomous Driving (IEEE MMSP 2024) [Paper]
  • Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement (Arxiv 2024) [Paper]

Lidar

  • MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term Motion-Guided Temporal Attention for 3D Object Detection (AAAI 2023)[paper][Github]
  • PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection (Arxiv 2023) [Paper]
  • V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection (Arxiv 2023) [Paper]
  • SEED: A Simple and Effective 3D DETR in Point Clouds (ECCV 2024) [Paper] [Github]

Monocular

  • Learning 2D to 3D Lifting for Object Detection in 3Dfor Autonomous Vehicles (IROS 2019) [Paper] [Project Page
  • Orthographic Feature Transform for Monocular 3D Object Detection (BMVC 2019) [Paper] [Github]
  • BEV-MODNet: Monocular Camera-based Bird's Eye View Moving Object Detection for Autonomous Driving (ITSC 2021) [Paper] [Project Page]
  • Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021) [Paper] [Github]
  • PersDet: Monocular 3D Detection in Perspective Bird’s-Eye-View (Arxiv 2022) [Paper]
  • Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving (CVPR 2022) [Paper]
  • Monocular 3D Object Detection with Depth from Motion (ECCV 2022) [paper][Github]
  • MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection (ICCV 2023) [Paper] [Github]
  • S3-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection (Arxiv 2023) [Paper] [Github]
  • MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings (Arxiv 2023) [Paper]
  • YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection (Arxiv 2023) [Paper]
  • UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [Paper]
  • Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving (Arxuv 2024) [paper] [Github]
  • UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [Paper]
  • MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method (Arxiv 2024) [Paper]

Multiple Camera

  • Object DGCNN: 3D Object Detection using Dynamic Graphs (NIPS 2021) [Paper][Github]
  • BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View (Arxiv 2022) [Paper] [Github]
  • DETR3D:3D Object Detection from Multi-view Image via 3D-to-2D Queries (CORL 2021) [Paper] [Github]
  • BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework (NeurIPS 2022) [Paper][Github]
  • Unifying Voxel-based Representation withTransformer for 3D Object Detectio (NeurIPS 2022) [paper][Github]
  • Polar Parametrization for Vision-based Surround-View 3D Detection (arxiv 2022) [Paper] [Github]
  • SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving (Arxiv 2022) [Paper] [Github]
  • BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection (Arxuv 2022) [Paper] [Github]
  • BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stere (Arxiv 2022) [Paper][Github]
  • MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones (Arxiv 2022) [Paper] [Github]
  • Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object (Arxiv 2022)[Paper]
  • DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention (Arxiv 2022) [Paper]
  • Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [Paper]
  • SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detectio (IROS 2023) [Paper]
  • BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks (Arxiv 2022) [Paper]
  • STS: Surround-view Temporal Stereo for Multi-view 3D Detection (Arxiv 2022) [Paper]
  • BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection (Arxiv 2022) [Paper]
  • Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [Paper]
  • AutoAlign: Pixel-Instance Feature Aggregationfor Multi-Modal 3D Object Detection (IJCAI 2022) [Paper]
  • Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection (ACM MM 2022) [paper][Github]
  • ORA3D: Overlap Region Aware Multi-view 3D Object Detection (BMVC 2022) [Paper] [Project Page]
  • AutoAlignV2: Deformable Feature Aggregation for DynamicMulti-Modal 3D Object Detection (ECCV 2022) [Paper][Github]
  • CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022) [paper][Github]
  • SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022) [Paper][Github]
  • Position Embedding Transformation for Multi-View 3D Object Detection (ECCV 2022) [Paper] [Github]
  • BEVDepth: Acquisition of Reliable Depth forMulti-view 3D Object Detection (AAAI 2023) [Paper] [Github]
  • PolarFormer: Multi-camera 3D Object Detectionwith Polar Transformers (AAAI 2023) [Paper][Github]
  • A Simple Baseline for Multi-Camera 3D Object Detection (AAAI 2023) [Paper][Github]
  • Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection (Arxiv 2023) [Paper] [Github]
  • Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion (Arxiv 2023) [Paper] [Github]
  • BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection (Arxiv 2023) [Paper][Github]
  • BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo (Arxiv 2023) [Paper]
  • BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap (Arxiv 2023) [Paper] [Github]
  • DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking (Arxiv 2023) [Paper] [Github]
  • Geometric-aware Pretraining for Vision-centric 3D Object Detection (Arxiv 2023) [Paper] [Github]
  • Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception (Arxiv 2023) [Paper]
  • OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection (Arxiv 2023) [Paper]
  • Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction (ICCV 2023) [Paper] [Github]
  • VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection (Arxiv 2023) [Paper]
  • Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability (Arxiv 2023) [Paper]
  • VoxelFormer: Bird’s-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection (Arxiv 2023) [Paper] [Github]
  • TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning (Arxiv 2023) [Paper] [Github]
  • CrossDTR: Cross-view and Depth-guided Transformersfor 3D Object Detection (ICRA 2023) [Paper][Github]
  • SOLOFusion: Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection (ICLR 2023) [paper][Github]
  • BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection (ICLR 2023) [Paper][Github]
  • UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View (CVPR 2023)[Paper][Github]
  • Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving (CVPR 2023) [Paper]
  • Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection (CVPR 2023) [Paper] [Github]
  • Aedet: Azimuth-invariant multi-view 3d object detection (CVPR 2023) [Paper] [Github] [Project]
  • BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection (CVPR 2023) [Paper] [Github]
  • CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR 2023) [Paper] [Github]
  • FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection (CVPR 2023) [Paper] [Github]
  • Sparse4D v2 Recurrent Temporal Fusion with Sparse Model (Arxiv 2023) [Paper] [Github]
  • DA-BEV : Depth Aware BEV Transformer for 3D Object Detection (Arxiv 2023) [Paper]
  • BEV-IO: Enhancing Bird’s-Eye-View 3D Detectionwith Instance Occupancy (Arxiv 2023) [Paper]
  • OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection (Arxiv) [Paper]
  • SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection (ICCV 2023) [Paper] [Github]
  • Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images (Arxiv 2023) [paper]
  • DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting (Arxiv 2023) [Paper]
  • Far3D: Expanding the Horizon for Surround-view 3D Object Detection (Arxiv 2023) [Paper]
  • HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird’s Eye View (Arxiv 2023) [paper]
  • Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection (ICCV 2023) [Paper] [Github]
  • 3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers (ICCV 2023) [Paper] [Github] [Github]
  • FB-BEV: BEV Representation from Forward-Backward View Transformations (ICCV 2023) [paper] [Github]
  • QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection (ICCV 2023) [Paper]
  • SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos (ICCV 2023) [Paper] [Github]
  • NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection (ICCV 2023) [paper] [Github]
  • DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023) [paper]
  • BEVHeight++: Toward Robust Visual Centric 3D Object Detection (Arxiv 2023) [paper]
  • UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities (Arxiv 2023) [Paper]
  • Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving (Arxiv 2023) [Paper]
  • Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection (ICCV 2023) [Paper] [Github] [Project]
  • CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion (Arxiv 2023) [paper]
  • DynamicBEV: Leveraging Dynamic Queries and Temporal Context for 3D Object Detection (Arxiv 2023) [paper]
  • TOWARDS GENERALIZABLE MULTI-CAMERA 3D OBJECT DETECTION VIA PERSPECTIVE DEBIASING (Arxiv 2023) [Paper]
  • Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection (NeurIPS 2023) (Arxiv 2023) [Paper] [Github]
  • M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D Object (Arxiv 2023) [Paper]
  • Sparse4D v3 Advancing End-to-End 3D Detection and Tracking (Arxiv 2023) [Paper] [Github]
  • BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection (Arxiv 2023) [paper]
  • Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [Paper]
  • Residual Graph Convolutional Network for Bird”s-Eye-View Semantic Segmentation (Arxiv 2023) [Paper]
  • Diffusion-Based Particle-DETR for BEV Perception (Arxiv 2023) [paper]
  • M-BEV: Masked BEV Perception for Robust Autonomous Driving (Arxiv 2023) [Paper]
  • Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps (Arxiv 2023) [Paper]
  • Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [Paper]
  • WidthFormer: Toward Efficient Transformer-based BEV View Transformation (Arxiv 2023) [Paper] [Github]
  • UniVision: A Unified Framework for Vision-Centric 3D Perception (Arxiv 2024) [Paper]
  • DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception (Arxiv 2024) [Paper]
  • Towards Scenario Generalization for Vision-based Roadside 3D Object Detection (Arxiv 2024) [Paper] [Github]
  • CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow (CVPR 2024) [Paper]
  • GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection (Arxiv 2024) [paper]
  • Lifting Multi-View Detection and Tracking to the Bird's Eye View (Arxiv 2024) [paper] [Github]
  • DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection (Arxiv 2024) [Paper]
  • BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection (CVPR 2024) [Paper] [Github]
  • OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection (ECCV 2024) [Paper] [Github]
  • FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection (ECCV 2024) [Paper]
  • PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View (Arxiv 2024) [Paper]
  • GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection (Arxiv 2024) [Paper]
  • Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression (ECCV 2024) [Paper] [Github]
  • MambaBEV: An efficient 3D detection model with Mamba2 (Arxiv 2024) [Paper]
  • ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object (Arxiv 2024) [Paper]

BEV Segmentation

Lidar Camera

  • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Axxiv 2023) [Paper] [Github]
  • X-Align: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation (WACV 2023) [Paper]
  • BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation (ICRA 2023) [Paper] [Github] [Project] UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving (Arxiv 2023) [Paper]
  • BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation (Arxiv 2023) [paper]
  • Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (Arxiv 2023) [paper]
  • LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation (CVPR 2023) [Paper] [Github]
  • BEV-Guided Multi-Modality Fusion for Driving Perception (CVPR 2023) [Paper] [Github]
  • FUSIONFORMER: A MULTI-SENSORY FUSION IN BIRD’S-EYE-VIEW AND TEMPORAL CONSISTENT TRANSFORMER FOR 3D OBJECTION (Arxiv 2023) [paper]
  • UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation (ICCV 2023) [Paper] [Github]
  • BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird’s Eye View Map Construction (Arxiv 2023) [Paper]
  • BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation (Arxiv 2024) [paper]
  • OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation (Arxiv 2024) [Paper]

Lidar

  • LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object Detection, and Panoptic Segmentation in a Single Multi-task Network (Arxiv 2022) [paper]
  • SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation (Arxiv 2023) [Paper]
  • BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds (3DV 2023) [Paper] [Github]

Monocular

  • Learning to Look around Objects for Top-View Representations of Outdoor Scenes (ECCV 2018) [paper]
  • A Parametric Top-View Representation of Complex Road Scenes (CVPR 2019) [Paper]
  • Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks (ICRA 2019 IEEE RA-L 2019) [Paper] [Github]
  • Short-Term Prediction and Multi-Camera Fusion on Semantic Grids (ICCVW 2019) [paper]
  • Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks (CVPR 2020) [Paper] [Github]
  • MonoLayout : Amodal scene layout from a single image (WACV 2020) [Paper] [Github]
  • Bird’s Eye View Segmentation Using Lifted2D Semantic Features (BMVC 2021) [Paper]
  • Enabling Spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation (ICRA 2021) [Paper] [mp4]
  • Projecting Your View Attentively: Monocular Road Scene Layout Estimation viaCross-view Transformation (CVPR 2021) [Paper] [Github]
  • ViT BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation (IEEE IJCNN 2022) [paper]
  • Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images (IEEE RA-L 2022) [Paper] [Github] [Project]
  • Understanding Bird's-Eye View of Road Semantics using an Onboard Camera (ICRA 2022) [Paper] [Github]
  • “The Pedestrian next to the Lamppost”Adaptive Object Graphs for Better Instantaneous Mapping (CVPR 2022) [Paper]
  • Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts (CVPR 2022) [Paper]
  • Translating Images into Maps (ICRA 2022) [Paper] [Github]
  • GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation (ECCV 2022) [Paper]
  • SBEVNet: End-to-End Deep Stereo Layout Estimation (WACV 2022) [Paper]
  • BEVSegFormer: Bird’s Eye View Semantic Segmentation From ArbitraryCamera Rigs (WACV 2023) [Paper]
  • DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception (Arxiv 2023) [Paper] [Github]
  • HFT: Lifting Perspective Representations via Hybrid Feature Transformation (ICRA 2023) [Paper] [Github]
  • SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images (Arxiv 2023) [Paper]
  • Calibration-free BEV Representation for Infrastructure Perception (Arxiv 2023) [Paper]
  • Semi-Supervised Learning for Visual Bird’s Eye View Semantic Segmentation (Arxiv 2023) [Paper]
  • DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEVPerception (Arxiv 2023) [paper] [github] [Project]
  • CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity (Arxiv 2023) [Paper]
  • SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects (CVPR 2024) [Paper] [Github]
  • DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning (Arxiv 2024) [Paper] [Github]
  • Improved Single Camera BEV Perception Using Multi-Camera Training (ITSC 2024) [Paper]
  • Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation (Arxiv 2024) [Paper]

Multiple Camera

  • A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View (IEEE ITSC 2020)[Paper] [Github]
  • Cross-view Semantic Segmentation for Sensing Surroundings (IROS 2020 IEEE RA-L 2020) [Paper] [Github] [Project]
  • Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020) [Paper] [Github] [Project]
  • Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022) [Paper] [Github]
  • Scene Representation in Bird’s-Eye View from Surrounding Cameras withTransformers (CVPRW 2022) [Paper]
  • M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation (Arxiv 2022) [Paper] [Project]
  • BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (Arxiv 2022) [Paper] [Github]
  • Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer (Arxiv 2022) [Paper] [Github]
  • A Simple Baseline for BEV Perception Without LiDAR (Arxiv 2022) [Paper] [Github] [Project Page]
  • UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View (ICCV 2023) [Paper] [Github
  • LaRa: Latents and Rays for Multi-CameraBird’s-Eye-View Semantic Segmentation (CORL 2022) [Paper]) [Github]
  • CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers (CORL 2022) [Paper] [Github]
  • Vision-based Uneven BEV Representation Learningwith Polar Rasterization and Surface Estimation (CORL 2022) [Paper] [Github]
  • BEVFormer: a Cutting-edge Baseline for Camera-based Detection (ECCV 2022) [Paper] [Github]
  • JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes (ECCV 2022) [Paper] [Github]
  • Learning Ego 3D Representation as Ray Tracing (ECCV 2022) [Paper] [Github]
  • Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception (NIPS 2022 Workshop) [Paper] or [Paper] [Github]
  • Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline (Arxiv 2023) [Paper] [Github]
  • BEVFormer v2: Adapting Modern Image Backbones toBird’s-Eye-View Recognition via Perspective Supervision (CVPR 2023) [Paper]
  • MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (CVPR 2023) [Paper]
  • Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving (Arxiv 2023) [paper] [Github]
  • MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception (ICCV 2023) [Paper] [Github]
  • MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation (ICCV 2023) [paper] [Github] [Project]
  • One Training for Multiple Deployments: Polar-based Adaptive BEV Perception for Autonomous Driving (Arxiv 2023) [Paper]
  • RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions (Arxiv 2023) [paper] [Github] [Project]
  • X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation (Arxiv 2023) [Paper]
  • PowerBEV: A Powerful Yet Lightweight Framework forInstance Prediction in Bird’s-Eye View (Axriv 2023) [paper]
  • Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird’s-Eye View (ICCV 2023) [Paper]
  • Towards Viewpoint Robustness in Bird’s Eye View Segmentation (ICCV 2023) [Paper] [Project]
  • PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View (Arxiv 2023) [Paper]
  • PointBeV: A Sparse Approach to BeV Predictions (Arxiv 2023) [paper] [Github]
  • DualBEV: CNN is All You Need in View Transformation (Arxiv 2024) [Paper]
  • MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning (Arxiv 2024) [paper]
  • HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras (Arxiv 2024) [Paper] [Github]
  • Improving Bird's Eye View Semantic Segmentation by Task Decomposition (CVPR 2024) [Paper] [Github]
  • SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation (CVPR 2024) [Paper] [Github]
  • RoadBEV: Road Surface Reconstruction in Bird's Eye View (Arxiv 2024) [Paper] [Github]
  • TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation (Arxiv 2024) [Paper]
  • DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model (Arxiv 2024) [Paper]
  • Bird's-Eye View to Street-View: A Survey (Arxiv 2024) [Paper]
  • LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping (Arxiv 2024) [Paper]
  • Navigation Instruction Generation with BEV Perception and Large Language Models (ECCV 2024) [paper] [Github]
  • GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation (Arxiv 2024) [Paper]
  • MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation (ACM MM 2024) [paper]
  • Robust Bird’s Eye View Segmentation by Adapting DINOv2 (ECCV 2024 Workshop) [Paper]
  • Unveiling the Black Box: Independent Functional Module Evaluation for Bird’s-Eye-View Perception Model (Arxiv 2024) [Paper]
  • RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View (Arxiv 2024) [Paper]
  • OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping (ACCV 2024) [Paper] [Github]

Perception Prediction Planning

Monocular

  • Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning (WACV 2021) [Paper]
  • HOPE: Hierarchical Spatial-temporal Network for Occupancy Flow Prediction (CVPRW 2022) [paper]

Multiple Camera

  • FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV 2021) [Paper] [Github] [Project]
  • NEAT: Neural Attention Fields for End-to-End Autonomous Driving (ICCV 2021) [Paper] [Github]
  • ST-P3: End-to-end Vision-based AutonomousDriving via Spatial-Temporal Feature Learning (ECCV 2022) [Paper] [Github]
  • StretchBEV: Stretching Future InstancePrediction Spatially and Temporally (ECCV 2022) [Paper] [Github] [Projet]
  • TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving (CVPR 2023) [Paper] [Github]
  • Planning-oriented Autonomous Driving (CVPR 2023, Occupancy Prediction) [paper] [Github] [Project]
  • Think Twice before Driving:Towards Scalable Decoders for End-to-End Autonomous Driving (CVPR 2023) [Paper] [Github]
  • ReasonNet: End-to-End Driving with Temporal and Global Reasoning (CVPR 2023) [Paper]
  • LiDAR-BEVMTN: Real-Time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving (Arxiv 2023) [paper]
  • FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving (Arxiv 2023) [Paper]
  • VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning (Arxiv 2024) [Paper] [Project]
  • SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving (Arxiv 2024) [Paper]
  • SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation (Arxiv 2024) [paper] [Github]
  • DUALAD: Disentangling the Dynamic and Static World for End-to-End Driving (CVPR 2024) [Paper]
  • Solving Motion Planning Tasks with a Scalable Generative Model (ECCV 2024) [Paper] [Github]

Mapping

Lidar

  • Hierarchical Recurrent Attention Networks for Structured Online Map (CVPR 2018) [Paper]

Lidar Camera

  • End-to-End Deep Structured Models for Drawing Crosswalks (ECCV 2018) [Paper]
  • Probabilistic Semantic Mapping for Urban Autonomous Driving Applications (IROS 2020) [Paper] [Github]
  • Convolutional Recurrent Network for Road Boundary Extraction (CVPR 2022) [Paper]
  • Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [Paper]
  • M^2-3DLaneNet: Multi-Modal 3D Lane Detection (Arxiv 2022) [paper] [Github]
  • HDMapNet: An Online HD Map Construction and Evaluation Framework (ICRA 2022) [paper] [Github] [Project]
  • SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation (Arxiv 2023) [paper] [Github]
  • VMA: Divide-and-Conquer Vectorized MapAnnotation System for Large-Scale Driving Scene (Arxiv 2023) [Paper]
  • THMA: Tencent HD Map AI System for Creating HD Map Annotations (AAAI 2023) [paper]

Monocular

  • RoadTracer: Automatic Extraction of Road Networks from Aerial Images (CVPR 2018) [Paper] [Github]
  • DAGMapper: Learning to Map by Discovering Lane Topology (ICCV 2019) [paper]
  • End-to-end Lane Detection through Differentiable Least-Squares Fitting (ICCVW 2019) [paper]
  • VecRoad: Point-based Iterative Graph Exploration for Road Graphs Extraction (CVPR 2020) [Paper] [Github] [Project]
  • Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [paper] [Github]
  • iCurb: Imitation Learning-based Detection of Road Curbs using Aerial Images for Autonomous Driving (ICRA 2021 IEEE RA-L) [paper] [Github] [Project]
  • HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps (CVPR 2021) [paper]
  • Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images (ICCV 2021) [Paper] [Github]
  • RNGDet: Road Network Graph Detection by Transformer in Aerial Images (IEEE TGRS 2022) [[Paper] [Project]
  • RNGDet++: Road Network Graph Detection by Transformer with Instance Segmentation and Multi-scale Features Enhancement (IEEE RA-L 2022) [Paper] [Github] [Project]
  • SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving (ICRA 2022) [paper] [Github]
  • Laneformer: Object-aware Row-Column Transformers for Lane Detection (AAAI 2022) [Paper]
  • Lane-Level Street Map Extraction from Aerial Imagery (WACV 2022) [Paper] [Github]
  • Reconstruct from Top View: A 3D Lane Detection Approach based on GeometryStructure Prior (CVPRW 2022) [paper]
  • PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images (CVPR 2022) [Paper] [Github]
  • Topology Preserving Local Road Network Estimation from Single Onboard Camera Image (CVPR 2022) [Paper] [Github]
  • TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction (ECCV 2022) [Paper]
  • CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D (IEEE IVS 2023) [Paper]
  • Polygonizer: An auto-regressive building delineator (ICLRW 2023) [Paper]
  • CurveFormer: 3D Lane Detection by Curve Propagation with CurveQueries and Attention (ICRA 2023) [Paper]
  • Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection (CVPR 2023) [paper] [Github]
  • Learning and Aggregating Lane Graphs for Urban Automated Driving (Arxiv 2023) [paper]
  • Online Lane Graph Extraction from Onboard Video (Arxiv 2023) [paper] [Github]
  • Video Killed the HD-Map: Predicting Driving BehaviorDirectly From Drone Images (Arxiv 2023) [Paper]
  • Prior Based Online Lane Graph Extraction from Single Onboard Camera Image (Arxiv 2023) [Paper]
  • Online Monocular Lane Mapping Using Catmull-Rom Spline (Arxiv 2023) [Paper] [Github]
  • Improving Online Lane Graph Extraction by Object-Lane Clustering (ICCV 2023) [Paper]
  • LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [Paper] [Github]
  • Patched Line Segment Learning for Vector Road Mapping (Arxiv 2023) [paper]
  • Sparse Point Guided 3D Lane Detection (ICCV 2023) [Paper] [Github]
  • Recursive Video Lane Detection (ICCV 2023) [Paper] [Github]
  • LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [Paper] [Github]
  • Occlusion-Aware 2D and 3D Centerline Detection for Urban Driving via Automatic Label Generation (ARXIV 2023) [PAPER]
  • BUILDING LANE-LEVEL MAPS FROM AERIAL IMAGES (Arxiv 2023) [paper]
  • LaneCPP: Continuous 3D Lane Detection using Physical Priors (CVPR 2024) [Paper]
  • DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles (Arxiv 2024) [Paper] [Github]

Multiple Camera

  • PersFormer: a New Baseline for 3D Laneline Detection (ECCV 2022) [Paper] [Github]
  • Continuity-preserving Path-wise Modeling for Online Lane Graph Construction (Arxiv 2023) [paper] [Github]
  • VAD: Vectorized Scene Representation for Efficient Autonomous Driving (Arxiv 2023) [paper] [Github]
  • InstaGraM: Instance-level Graph Modelingfor Vectorized HD Map Learning (Arxiv 2023) [Paper]
  • VectorMapNet: End-to-end Vectorized HD Map Learning (Arxiv 2023) [Paper] [Github] [Project]
  • Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • Topology Reasoning for Driving Scenes (Arxiv 2023) [paper] [Github]
  • MV-Map: Offboard HD-Map Generation with Multi-view Consistency (Arxiv 2023) [paper] [Github]
  • CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation (ICRA 2023) [paper] [Github]
  • Structured Modeling and Learning for Online Vectorized HD Map Construction (ICLR 2023) [paper] [Github]
  • Neural Map Prior for Autonomous Driving (CVPR 2023) [Paper]
  • An Efficient Transformer for Simultaneous Learning of BEV and LaneRepresentations in 3D Lane Detection (Arxiv 2023) [paper]
  • TopoMask: Instance-Mask-Based Formulation for the Road Topology Problemvia Transformer-Based Architecture (Arxiv 2023) [apper]
  • PolyDiffuse: Polygonal Shape Reconstruction viaGuided Set Diffusion Models (Arxiv 2023) [paper] [Github] [Project]
  • Online Map Vectorization for Autonomous Driving: A Rasterization Perspective (Arxiv 2023) [Paper]
  • NeMO: Neural Map Growing System forSpatiotemporal Fusion in Bird’s-Eye-Viewand BDD-Map Benchmark (Arxiv 2023) [Paper]
  • MachMap: End-to-End Vectorized Solution for Compact HD-Map Construction (CVPR 2023 Workshop) [Paper]
  • Lane Graph as Path: Continuity-preserving Path-wise Modelingfor Online Lane Graph Construction (Arxiv 2023) [paper]
  • End-to-End Vectorized HD-map Construction with Piecewise B ́ezier Curve (CVPR 2023) [Paper] [Github]
  • GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping (Arxiv 2023) [Paper]
  • MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction (Arxiv 2023) [Paper]
  • LATR: 3D Lane Detection from Monocular Images with Transformer (Arxiv 2023) [Paper]
  • INSIGHTMAPPER: A CLOSER LOOK AT INNER-INSTANCE INFORMATION FOR VECTORIZED HIGH-DEFINITION MAPPING (Arxiv 2023) [Paper] [Project] [Github]
  • HD Map Generation from Noisy Multi-Route Vehicle Fleet Data on Highways with Expectation Maximization (Arxiv 2023) [Paper]
  • StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction (WACV 2024) [Paper] [Github]
  • PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction (ICCV 2023) [Paper]
  • Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach (ICCV 2023) [paper]
  • TopoMLP: An Simple yet Strong Pipeline for Driving Topology Reasoning (Arxiv 2023) [paper] [Github]
  • ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction (CoRL 2023) [Paper] [Github]
  • Mind the map! Accounting for existing map information when estimating online HDMaps from sensor data (Arxiv 2023) [Paper]
  • Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps (Arxiv 2023) [Paper] [Github]
  • P-MAPNET: FAR-SEEING MAP CONSTRUCTOR ENHANCED BY BOTH SDMAP AND HDMAP PRIORS (ICLR 2024 submitted paper) [Openreview] [Paper]
  • Online Vectorized HD Map Construction using Geometry (Arxiv 2023) [paper] [Github]
  • LANESEGNET: MAP LEARNING WITH LANE SEGMENT PERCEPTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Github]
  • 3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching (Arxiv 2024) [Paper
  • MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction (Arxiv 2024) [Paper]
  • Stream Query Denoising for Vectorized HD Map Construction (Arxiv 2024) [Paper]
  • ADMap: Anti-disturbance framework for reconstructing online vectorized HD map (Arxiv 2024) [Paper]
  • PLCNet: Patch-wise Lane Correction Network for Automatic Lane Correction in High-definition Maps (Arxiv 2024) [Paper]
  • LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement (AAAI 2024) [paper]
  • VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [Paper]
  • CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention (Arxiv 2024) [Paper]
  • VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [paper]
  • Lane2Seq: Towards Unified Lane Detection via Sequence Generation (CVPR 2024) [Paper]
  • Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction (Arxiv 2024) [Paper] [Github]
  • MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping (Arxiv 2024) [paper] [Github]
  • Producing and Leveraging Online Map Uncertainty in Trajectory Prediction (CVPR 2024) [Paper] [Github]
  • MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction (CVPR 2024) [Paper] [Github]
  • HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction (CVPR 2024) [Paper]
  • SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations (Arxiv 2024) [Paper]
  • DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction (Arxiv 2024) [Paper]
  • Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction (Arxiv 2024) [Paper]
  • Is Your HD Map Constructor Reliable under Sensor Corruptions? (Arxiv 2024) [Paper] [Github] [Project]
  • DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation(KDD 2024)[Paper]
  • LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction (Arxiv 2024) [Paper]
  • Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention (ECCV 2024) [Paper] [Github]
  • BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight (Arxiv 2024) [Paper]
  • Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data (Arxiv 2024) [Paper]
  • MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation (ECCV 2024) [Paper]
  • Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks (Arxiv 2024) [Paper] [Github]
  • Generation of Training Data from HD Maps in the Lanelet2 Framework (Arxiv 2024) [Paper]
  • PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction (Arxiv 2024) [paper] [Github]
  • CAMAv2: A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2024) [Paper]
  • HeightLane: BEV Heightmap guided 3D Lane Detection (Arxiv 2024) [paper]
  • PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors (Arxiv 2024) [Paper]
  • Local map Construction Methods with SD map: A Novel Survey (Arxiv 2024) [Paper]
  • Enhancing Vectorized Map Perception with Historical Rasterized Maps (ECCV 2024) [Paper] [Github]
  • GenMapping: Unleashing the Potential of Inverse Perspective Mapping for Robust Online HD Map Construction (Arxiv 2024) [Paper] [Github]
  • GlobalMapNet: An Online Framework for Vectorized Global HD Map Construction (Arxiv 2024) [[paper]] (https://arxiv.org/abs/2409.10063)
  • MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction (Arxiv 2024) [Paper]
  • MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction (Arxiv 2024) [paper]
  • Exploring Semi-Supervised Learning for Online Mapping (Arxiv 2024) [Paper]

Lanegraph

Monocular

  • Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [Paper]
  • AutoGraph: Predicting Lane Graphs from Traffic Observations (IEEE RAL 2023) [Paper]
  • Learning and Aggregating Lane Graphs for Urban Automated Driving (CVPR 2023) [Paper]
  • TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes (Arxiv 2024) [Paper]
  • Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors (Arxiv 2024) [Paper]
  • Learning Lane Graphs from Aerial Imagery Using Transformers (Arxiv 2024) [Paper]
  • TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem (Arxiv 2024) [Paper]
  • LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations (ITSC 2024) [Paper]

Tracking

  • Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer (Arxiv 2022) [Paper] [Github]
  • EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View (Arxiv 2023) [paper] [Github]
  • Traj-MAE: Masked Autoencoders for Trajectory Prediction (Arxiv 2023) [Paper]
  • Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes (Arxiv 2024) [Paper]
  • MapsTP: HD Map Images Based Multimodal Trajectory Prediction for Automated Vehicles (Arixv 2024) [Paper]
  • Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures (Arxiv 2024) [Paper]
  • Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving (Arxiv 2024) [Paper]
  • VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions (Arxiv 2024) [Paper]

Locate

  • BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images (Arxiv 2022) [paper]
  • BEV-SLAM: Building a Globally-Consistent WorldMap Using Monocular Vision (IROS 2022) [Paper]
  • U-BEV: Height-aware Bird’s-Eye-View Segmentation and Neural Map-based Relocalization (Arxiv 2023) [Paper]
  • Monocular Localization with Semantics Map for Autonomous Vehicles (Arxiv 2024) [Paper]

Occupancy Prediction

  • Semantic Scene Completion from a Single Depth Image (CVPR 2017) [Paper]
  • Occupancy Networks: Learning 3D Reconstruction in Function Space (CVPR 2019) [Paper] [Github]
  • S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds (CoRL 2020) [Paper]
  • 3D Semantic Scene Completion: a Survey (IJCV 2021) [Paper]
  • Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data (Arxiv 2021) [Paper]
  • Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion (AAAI 2021) [Paper]
  • Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [Paper]
  • Estimation of Appearance and Occupancy Information in Bird’s EyeView from Surround Monocular Images (Arxiv 2022) [paper] [Project]
  • Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds (IROS 2021) [Paper] [Github]
  • Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review (Arxiv 2023) [paper]
  • LMSCNet: Lightweight Multiscale 3D Semantic Completion (IC 3DV 2020) [Paper] [[Github]
  • MonoScene: Monocular 3D Semantic Scene Completion (CVPR 2022) [Paper] [Github] [Project]
  • OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction (ICCV 2023) [Paper] [Github]
  • A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network (Arxiv 2023) [Paper] [Github]
  • OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception (Arxiv 2023) [paper] [Github]
  • Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving (Arxiv 2023) [Paper] [Github] [Project]
  • Occ-BEV: Multi-Camera Unified Pre-training via 3DScene Reconstruction (Arxiv 2023) [Paper] [Github]
  • StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion (Arxiv 2023) [paper] [Github]
  • Learning Occupancy for Monocular 3D Object Detection (Arxiv 2023) [Paper] [Github]
  • OVO: Open-Vocabulary Occupancy (Arxiv 2023) [Paper]
  • SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving (Arxiv 2023) [paper] [Github] [Project]
  • Scene as Occupancy (Arxiv 2023) [[Paper]]](https://arxiv.org/pdf/2306.02851.pdf) [Github]
  • Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data (Arxiv 2023) [Paper] [Github]
  • PanoOcc: Unified Occupancy Representation for Camera-based3D Panoptic Segmentation (Arxiv 2023) [Paper] [Github]
  • UniOcc: Unifying Vision-Centric 3D Occupancy Predictionwith Geometric and Semantic Rendering (Arxiv 2023) [paper]
  • SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving (NeurIPS 2023 D&B track) [paper] [paper]
  • StereoVoxelNet: Real-Time Obstacle Detection Based on OccupancyVoxels from a Stereo Camera Using Deep Neural Networks (ICRA 2023) [[Paper]] (https://arxiv.org/pdf/2209.08459.pdf) [Github] [Project]
  • Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction (CVPR 2023) [Paper] [Github]
  • VoxFormer: a Cutting-edge Baseline for 3D Semantic Occupancy Prediction (CVPR 2023) [paper] [Github]
  • Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting (CVPR 2023) [Paper] [Github] [Project]
  • SSCBench: A Large-Scale 3D Semantic SceneCompletion Benchmark for Autonomous Driving (Arxiv 2023) [paper] [Github]
  • SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion (IROS 2023) [Paper] [Github]
  • CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion (Arxiv 2023) [paper]
  • Symphonize 3D Semantic Scene Completion with Contextual Instance Queries (Arxiv 2023) [Paper] [Github]
  • Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders (Arxiv 2023) [paper]
  • UniWorld: Autonomous Driving Pre-training via World Models (Arxiv 2023) [Paper] [Github]
  • PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction (Arxiv 2023) [paper] [Github]
  • SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection (Arxiv 2023) [paper] [Github]
  • OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection (Arxiv 2023) [Paper] [Github]
  • PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion (Arxiv 2023) [Paper]
  • SPOT: SCALABLE 3D PRE-TRAINING VIA OCCUPANCY PREDICTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper]
  • NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space (Arxiv 2023) [Github]
  • Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [Github] [Project]
  • RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision (Arxiv 2023) [paper] [Github]
  • LiDAR-based 4D Occupancy Completion and Forecasting (Arxiv 2023) [Paper] [Github]
  • SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints (Arxiv 2023) [Paper]
  • SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction (Arxiv 2023) [Paper] [Github]
  • FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin (Arxiv 2023) [paper]
  • Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications (Arxiv 2023) [paper] [Github]
  • OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving (Arxiv 2023) [paper] [Github]
  • DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion (Arxiv 2023) [Paper]
  • A Simple Framework for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries (Arxiv 2023) [Paper]
  • COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction (Arxiv 2023) [Paper]
  • OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields (Arxiv 2023) [paper] [Github]
  • RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation (Arxiv 2023) [paper]
  • PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Arxiv 2023) [paper] [Project] [Github]
  • POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images (Arxiv 2024) [Paper] [Github]
  • S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper]
  • InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction (Arxiv 2024) [Paper]
  • V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication (Arxiv 2024) [Paper]
  • OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow (Arxiv 2024) [Paper]
  • OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction (Arxiv 2024) [Paper]
  • OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction (Arxiv 2024) [Paper]
  • FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View (ICRA 2024) [Paper]
  • OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [paper]
  • PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (CVPR 2024) [Paper] [Github]
  • Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution (Arxiv 2024) [paper]
  • OccFiner: Offboard Occupancy Refinement with Hybrid Propagation (Arxiv 2024) [Paper]
  • MonoOcc: Digging into Monocular Semantic Occupancy Prediction (ICLR 2024) [Paper]
  • OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [paper]
  • Urban Scene Diffusion through Semantic Occupancy Map (Arxiv 2024) [Paper]
  • Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
  • SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction (CVPR 2024) [Paper]
  • Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation (CVPR 2024) [paper] [Github]
  • OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks (Arxiv 2023) [Paper]
  • OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving (Arxiv 2024) [paper]
  • ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers (Arxiv 2024) [paper]
  • A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective (Arxiv 2024) [Paper]
  • Vision-based 3D occupancy prediction in autonomous driving: a review and outlook (Arxiv 2024) [Paper]
  • GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision (Arxiv 2024) [Paper]
  • RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar (Arxiv 2024) [paper]
  • GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
  • OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [Paper] [Github]
  • EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network (Arxiv 2024) [Paper] [Github]
  • PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving (3DV 2024) [paper]
  • UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Arxiv 2024) [paper]
  • Context and Geometry Aware Voxel Transformer for Semantic Scene Completion (Arxiv 2024) [Paper] [Github]
  • Occupancy as Set of Points (ECCV 2024) [Paper] [Github]
  • Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion (Arxiv 2024) [Paper]
  • Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction (Arxiv 2024) [Paper]
  • Monocular Occupancy Prediction for Scalable Indoor Scenes (ECCV 2024) [Paper] [Github]
  • LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering (Arxiv 2024) [Paper]
  • VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction (Arxiv 2024) [paper]
  • Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection (Arxiv 2024) [paper] [Github]
  • OccMamba: Semantic Occupancy Prediction with State Space Models (Arxiv 2024) [paper]
  • HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction (IEEE RAL 2024) [paper]
  • Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance (Arxiv 2024) [Paper]
  • MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering (Arxiv 2024) [paper] [Project] [Github]
  • GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting (Arxiv 2024) [paper] [Github]
  • AdaOcc: Adaptive-Resolution Occupancy Prediction (Arxiv 2024) [Paper]
  • Diffusion-Occ: 3D Point Cloud Completion via Occupancy Diffusion (Arxiv 2024) [Paper]
  • UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height (Arxiv 2024) [paper]
  • COCO-Occ: A Benchmark for Occluded Panoptic Segmentation and Image Understanding (Arxiv 2024) [Paper]
  • CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction (ECCV 2024) [Paper] [Github]
  • ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning (Arxiv 2024) [Paper]
  • DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models (Arxiv 2024) [Paper]
  • SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs (Arxiv 2024) [Paper]
  • OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity (Arxiv 2024) [Paper] [Github] [Project]
  • DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [Paper] [Github]
  • OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects (Arxiv 2024) [Paper]

Occupancy Challenge

  • FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation (CVPR 2023 3D Occupancy Prediction Challenge WorkShop) [paper] [Github]
  • Separated RoadTopoFormer (Arxiv 2023) [Paper]
  • OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios (CVPR 2023 WorkShop) [Paper] [Github]
  • AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction (CVPR 2024 Workshop) [Paper]
  • Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement (Arxiv 2024) [Paper]

Challenge

  • The 1st-place Solution for CVPR 2023 OpenLane Topologyin Autonomous Driving Challenge [Paper]
  • MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report (CVPR 2024 Challenge) [Paper]

Dataset

  • Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark (CVPR 2023) [paper] [Github]
  • SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions (IV 2024) [Paper] [Project] [Github]
  • WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving (Arxiv 2024) [paper] [Project] [Github]
  • WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]

World Model

  • End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2024) [Paper] [Github]
  • Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving (ICRA 2024) [paper] [Github] [Project]
  • Language Prompt for Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • MotionLM: Multi-Agent Motion Forecasting as Language Modeling (Arxiv 2023) [paper]
  • GAIA-1: A Generative World Model for Autonomous Driving (Arxiv 2023) [paper]
  • DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving (Arxiv 2023) [paper]
  • Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • Learning to Drive Anywhere (CORL 2023) [Paper]
  • Language-Conditioned Path Planning (Arxiv 2023) [paper]
  • DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model (Arxiv 2023) [Paper] [Project]
  • GPT-Driver: Learning to Drive with GPT (Arxiv 2023) [Paper]
  • LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving (Arxiv 2023) [paper]
  • TOWARDS END-TO-END EMBODIED DECISION MAKING VIA MULTI-MODAL LARGE LANGUAGE MODEL: EXPLORATIONS WITH GPT4-VISION AND BEYOND (Arxiv 2023) [Paper]
  • DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model (Arxiv 2023) [Paper]
  • UNIPAD: A UNIVERSAL PRE-TRAINING PARADIGM FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Github]
  • PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm (Arxiv 2023) [Paper]
  • Uni3D: Exploring Unified 3D Representation at Scale (Arxiv 2023) [Paper] [Github]
  • Video Language Planning (Arxiv 2023) [paper] [Github]
  • RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models (Arxiv 2023) [Paper]
  • DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning (Arxiv 2023) [Paper] [Paper] [Project]
  • Vision Language Models in Autonomous Driving and Intelligent Transportation Systems (Arxiv 2023) [Paper]
  • ADAPT: Action-aware Driving Caption Transformer (ICRA 2023) [Paper] [Github]
  • Language Prompt for Autonomous Driving (Arxiv 2023) [paper] [Github]
  • Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models (Arxiv 2023) [Paper] [Project]
  • LEARNING UNSUPERVISED WORLD MODELS FOR AUTONOMOUS DRIVING VIA DISCRETE DIFFUSION (Arxiv 2023) [Paper]
  • ADriver-I: A General World Model for Autonomous Driving (Arxiv 2023) [Paper]
  • HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving (Arxiv 2023) [Paper]
  • On the Road with GPT-4V(vision): Early Explorations of Visual-Language Model on Autonomous Driving (Arxiv 2023) [paper]
  • GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (Arxiv 2023) [Paper]
  • Applications of Large Scale Foundation Models for Autonomous Driving (Arxiv 2023) [Paper]
  • Dolphins: Multimodal Language Model for Driving (Arxiv 2023) [Paper] [Project]
  • Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Arxiv 2023) [paper] [Github] [Project]
  • Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? (Arxiv 2023) [Paper] [Github]
  • NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations (Arxiv 2023) [paper] [Github]
  • DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving (Arxiv 2023) [Paper] [[Github]
  • DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes (Arxiv 2023) [Paper] [Project]
  • Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • Dialogue-based generation of self-driving simulation scenarios using Large Language Models (Arxiv 2023) [Paper] [Github]
  • Panacea: Panoramic and Controllable Video Generation for Autonomous Driving (Arxiv 2023) [paper] [Project] [Github]
  • LingoQA: Video Question Answering for Autonomous Driving (Arxiv 2023) [paper] [Github]
  • DriveLM: Driving with Graph Visual Question Answering (Arxiv 2023) [Paper] [Github]
  • LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding (Arxiv 2023) [Paper] [Project]
  • LMDrive: Closed-Loop End-to-End Driving with Large Language Models (Arxiv 2023) [Paper] [Github]
  • Visual Point Cloud Forecasting enables Scalable Autonomous Driving (Arxiv 2023) [Paper] [Github]
  • WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation (Arxiv 2023) [Paper] [Github]
  • Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models (Arxiv 2024) [Paper] [Github]
  • DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (Arxiv 2024) [Paper]
  • A Survey on Multimodal Large Language Models for Autonomous Driving (WACVW 2024) [Paper]
  • VLP: Vision Language Planning for Autonomous Driving (Arxiv 2023) [Paper]
  • Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities (Arxiv 2024) [Paper]
  • MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation (Arxiv 2024) [Paper]
  • Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents (Arxiv 2024) [Paper]
  • DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Arxiv 2024) [Paper] [Github]
  • GenAD: Generative End-to-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
  • Generalized Predictive Model for Autonomous Driving (CVPR 2024) [Paper]
  • AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving (Arxiv 2024) [paper]
  • DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving (Arxiv 2024) [Paper]
  • SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control (Arxiv 2024) [Paper] [Project]
  • DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation (Arxiv 2024) [Paper] [Project] [Github]
  • DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (ICLR 2024) [Paper] [Paper]
  • OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [Paper]
  • GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [Paper]
  • Guiding Attention in End-to-End Driving Models (Arxiv 2024) [Paper]
  • Probing Multimodal LLMs as World Models for Driving (Arxiv 2024) [Paper]
  • Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models (Arxiv 2024) [Paper]
  • Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (Arixv 2024) [Paper]
  • Unified End-to-End V2X Cooperative Autonomous Driving (Arxiv 2024) [paper]
  • DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving (Arxiv 2024) [paper]
  • OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [Paper]
  • GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [paper]
  • MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving (Arxiv 2024) [Paper]
  • MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes (Arxiv 2024) [Paper]
  • Language-Image Models with 3D Understanding (Arxiv 2024) [paper] [Project]
  • Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? (Arxiv 2024) [Paper]
  • GFlow: Recovering 4D World from Monocular Video (Arxiv 2024) [Paper] [Github]
  • Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving (Arxiv 2024) [Paper] [Github]
  • Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability (Arxiv 2024) [Paper] [Github] [Project]
  • OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [Paper] [Github] [Project]
  • DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences (Arxiv 2024) [Paper] [Github]
  • AD-H: Autonomous Driving with Hierarchical Agents (Arxiv 2024) [Paper]
  • Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
  • A Superalignment Framework in Autonomous Driving with Large Language Models (Arxiv 2024) [Paper]
  • Enhancing End-to-End Autonomous Driving with Latent World Model (Arxiv 2024) [Paper]
  • SimGen: Simulator-conditioned Driving Scene Generation (Arxiv 2024) [paper]
  • Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset (Arxiv 2024) [paper] [Project]
  • WonderWorld: Interactive 3D Scene Generation from a Single Image (Arxiv 2024) [Paper]
  • CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [Paper]
  • End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation (Arxiv 2024) [paper]
  • CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [Paper]
  • BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space (Arxiv 2024) [Paper] [Github]
  • Exploring the Causality of End-to-End Autonomous Driving (Arxiv 2024) [paper] [Github]
  • SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving (Arxiv 2024) [Paper]
  • DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving (Arxiv 2024) [Paper] [Github]
  • Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving (Arxiv 2024) [Paper]
  • Open 3D World in Autonomous Driving (Arxiv 2024) [Paper]
  • CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving (Arxiv 2024) [Paper]
  • Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving (Arxiv 2024) [Paper]
  • DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving (Arxiv 2024) [Paper]
  • OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving (Arxiv 2024) [Paper]
  • Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving (Arxiv 2024) [Paper]
  • ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models (ITSC 2024) [Paper]
  • MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving (Arxiv 2024) [paper]
  • RenderWorld: World Model with Self-Supervised 3D Label (Arxiv 2024) [Paper]
  • Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving (Arxiv 2024) [Paper]
  • DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input (Arxiv 2024) [Paper] [Project] [Github]
  • METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance (Arxiv 2024) [paper]
  • DOES END-TO-END AUTONOMOUS DRIVING REALLY NEED PERCEPTION TASKS? (Arxiv 2024) [Paper]
  • Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [Paper]
  • Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models (Arxiv 2024) [paper]
  • ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding (Arxiv) [Paper]
  • Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [Paper]
  • HE-Drive: Human-Like End-to-End Driving with Vision Language Models (Arxiv 2024) [Paper] [Project] [Paper]
  • UniDrive: Towards Universal Driving Perception Across Camera Configurations (Arxiv 2024) [Paper] [Github]
  • DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation (Arxiv 2024) [paper] [Github] [Project]
  • DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model (NeurIPS 2024) [Paper] [Project] [Github]

Other

  • Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views (AAAI 2021) [Paper] [Github] [Project]
  • Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers (WACV 2023) [[Paper]](Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers)
  • ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view (Arxiv 2022) [paper]
  • 360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View (Arxiv 2023) [Paper] [Github] [Project]
  • F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving (Arxiv 2023) [Paper]
  • NVAutoNet: Fast and Accurate 360∘ 3D Visual Perception For Self Driving (Arxiv 2023) [Paper]
  • FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems (Arxiv 2023) [Paper]
  • Aligning Bird-Eye View Representation of PointCloud Sequences using Scene Flow (IEEE IV 2023) [Paper] [Github]
  • MotionBEV: Attention-Aware Online LiDARMoving Object Segmentation with Bird’s Eye Viewbased Appearance and Motion Features (Arxiv 2023) [Paper]
  • WEDGE: A multi-weather autonomous driving dataset built from generativevision-language models (Arxiv 2023) [Paper] [Github] [Project]
  • Leveraging BEV Representation for360-degree Visual Place Recognition (Arxiv 2023) [Paper]
  • NMR: Neural Manifold Representation for Autonomous Driving (Arxiv 2023) [Paper]
  • V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV 2022) [Paper] [Github]
  • DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative3D Object Detection (CVPR 2022) [Paper] [Github]
  • Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task (CVPR 2022) [Paper] [Github] [Project]
  • A Motion and Accident Prediction Benchmark for V2X Autonomous Driving (Arxiv 2023) [Paper] [Project]
  • BEVBert: Multimodal Map Pre-training for Language-guided Navigation (ICCV 2023) [Paper]
  • V2X-Seq: A Large-Scale Sequential Dataset forVehicle-Infrastructure Cooperative Perception and Forecasting (Arxiv 2023) [Paper] [Github] [Project]
  • BUOL: A Bottom-Up Framework with Occupancy-aware Lifting forPanoptic 3D Scene Reconstruction From A Single Image (CVPR 2023) [paper] [Github]
  • BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird’s-Eye-View in Dynamic Scenarios (Arxiv 2023) [Paper]
  • Bird’s-Eye-View Scene Graph for Vision-Language Navigation (Arxiv 2023) [paper]
  • OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data (Arxiv 2023) [paper]
  • Hidden Biases of End-to-End Driving Models (ICCV 2023) [Paper] [[Github]][https://github.com/autonomousvision/carla_garage]
  • EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps (Arxiv 2023) [Paper]
  • End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2023) [paper] [Github]
  • BEVPlace: Learning LiDAR-based Place Recognition using Bird’s Eye View Images (ICCV 2023) [paper]
  • I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through Bird’s Eye View Projections (IROS 2023) [Paper]
  • Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving (Arxiv 2023) [Paper] [Project]
  • BEV-DG: Cross-Modal Learning under Bird’s-Eye View for Domain Generalization of 3D Semantic Segmentation (ICCV 2023) [paper]
  • MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (ICCV 2023) [Paper] [Github] [Project]
  • Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [Paper] [Github]
  • Occ2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions (ICCV 2023) [Paper]
  • QUEST: Query Stream for Vehicle-Infrastructure Cooperative Perception (Arxiv 2023) [paper]
  • Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction (Arxiv 2023) [Paper]
  • SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping an Building Change Detection (Arxiv 2023) [paper]
  • Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review (Arxiv 2023) [Paper]
  • BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving (Arxiv 2023) [paper]
  • BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation (Arxiv 2023) [Paper]
  • Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception (Arxiv 2023) [paper]
  • PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds (Arxiv 2023) [paper]
  • BEVTrack: A Simple Baseline for 3D Single Object Tracking in Birds's-Eye-View (Arxiv 2023) [Paper] [Github]
  • BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation (Arxiv 2023) [Paper]
  • UC-NERF: NEURAL RADIANCE FIELD FOR UNDER-CALIBRATED MULTI-VIEW CAMERAS IN AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Project] [Github]
  • All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes (Arxiv 2023) [paper]
  • BEVSeg2TP: Surround View Camera Bird’s-Eye-View Based Joint Vehicle Segmentation and Ego Vehicle Trajectory Prediction (Arxiv 2023) [Paper]
  • BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout (Arxiv 2023) [Paper]
  • EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI (Arxiv 2023) [Paper] [Github]
  • A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2023) [paper]
  • C-BEV: Contrastive Bird’s Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation (Arxiv 2023) [paper]
  • Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals (Arxiv 2024) [Paper]
  • GeoDecoder: Empowering Multimodal Map Understanding (Arxiv 2024) [Paper]
  • Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird’s-Eye-View (Arxiv 2024) [Paper]
  • Text2Street: Controllable Text-to-image Generation for Street Views (Arxiv 2024) [paper]
  • Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps (Arxiv 2024) [Paper]
  • EV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues (Arxiv 2024) [paper]
  • OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [paper]
  • Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving (Arxiv 2024) [paper]
  • Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion (Arxiv 2024) [Paper] [Github]
  • M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving (Arxiv 2024) [Paper]
  • MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors (Arxiv 2024) [Paper]
  • Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization (Arxiv 2024) [Paper]
  • MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps (Arxiv 2024) [Paper]
  • Neural Semantic Map-Learning for Autonomous Vehicles (Arxiv 2024) [Paper]
  • AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [[Paper]](Arxiv 2024) [paper] [Project]
  • MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability (Arxiv 2024) [paper] [Github]
  • SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm (Arxiv 2024) [Paper] [Github]
  • UrbanWorld: An Urban World Model for 3D City Generation (Arxiv 2024) [Paper]
  • From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model (ICRA 2024) [Paper]
  • Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation (Arxiv 2024) [paper]
  • DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation (Arxiv 2024) [Paper] [Project]
  • BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autononomous Driving (Arxiv 2024) [Paper]