Awesome-BEV-Perception
本仓库由公众号【自动驾驶之心】 团队整理,欢迎关注,一览最前沿的技术分享!
自动驾驶之心是国内首个自动驾驶开发者社区!这里有最全面有效的自动驾驶与AI学习路线(感知/定位/融合)和自动驾驶与AI公司内推机会!
一、Overview
1. A review of BEV-based 3D target detection
Vision-Centric BEV Perception: A Survey
2. BEV Perception Update Roundup
Delving into the Devils of Bird’s-eye-view Perception: A Review, Evaluation and Recipe
3. A review of vision-radar fusion for BEV detection
Vision-RADAR fusion for Robotics BEV Detections: A Survey
4. A review of 3D target detection for self-driving surround view
Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey
二、Camera-based BEV
1. List of camera-based BEV sensing methods
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
DSGN: Deep Stereo Geometry Network for 3D Object Detection
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-Based 3D Detector
Is Pseudo-Lidar Needed for Monocular 3D Object Detection?
Inverse perspective mapping simplifies optical flow computation and obstacle detection
Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image
Learning to Map Vehicles into Bird’s Eye View
Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras through Homography
Driving Among Flatmobiles: Bird-Eye-View Occupancy Grids From a Monocular Camera for Holistic Trajectory Planning
Understanding Bird’s-Eye View of Road Semantics Using an Onboard Camera
Automatic dense visual semantic mapping from street-level imagery
Stacked Homography Transformations for Multi-View Pedestrian Detection
Cross-View Semantic Segmentation for Sensing Surroundings
FISHING Net: Future Inference of Semantic Heatmaps In Grids
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-View Transformation
Bird’s-Eye-View Panoptic Segmentation Using Monocular Frontal View Images
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
PETR: Position Embedding Transformation for Multi-View 3D Object Detection
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
Translating Images into Maps
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection supplemental
MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones
FIERY: Future Instance Prediction in Bird's-Eye View From Surround Monocular Cameras
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
2. Estimation of BEV Appearance and Occupancy Information Based on Surrounding Monocular Images
Estimation of Appearance and Occupancy Information in Bird’s EyeView from Surround Monocular Images
###3. BEV representation of de-camera parameters
Multi-Camera Calibration Free BEV Representation for 3D Object Detection
4.BEVFormerV2
BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision
5.无预标定的相机进行多视图subject registration
From a Bird’s Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
三、LiDAR-based BEV
1. List of LiDAR-based BEV sensing methods
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
SECOND: Sparsely Embedded Convolutional Detection
Center-Based 3D Object Detection and Tracking
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection
Structure Aware Single-Stage 3D Object Detection From Point Cloud
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection
Object DGCNN: 3D Object Detection using Dynamic Graphs
Voxel Transformer for 3D Object Detection
Embracing Single Stride 3D Object Detector With Sparse Transformer / paper / supplemental
AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds
PointPillars: Fast Encoders for Object Detection From Point Clouds
2. Point cloud-based pre-training framework
BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training
3. BEV-SAN: Accurate BEV 3D target detection using slicing attention
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
4. 2D target detection and LiDAR joint training (2.5D points)
Objects as Spatio-Temporal 2.5D points
5.BEV-LGKD:统一框架针对BEV 3D目标检测任务应用知识蒸馏
BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection
四、BEV Fusion
1. List of BEV fusion methods
Unifying Voxel-based Representation with Transformer for 3D Object Detection
MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting Through Multi-View Fusion of LiDAR Data
UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
2.BEV feature fusion improvement with X-Align camera and LiDAR
X-Align Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation
3. Radar and LiDAR BEV fusion system
RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection System
4. Summary of multimodal fusion methods under BEV
PointPainting: Sequential Fusion for 3D Object Detection (CVPR'19)
3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection (ECCV'20)
FUTR3D: A Unified Sensor Fusion Framework for 3D Detection (Arxiv'22)
MVP: Multimodal Virtual Point 3D Detection (NIPS'21)
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection
Unifying Voxel-based Representation with Transformer for 3D Object Detection
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection
AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection
CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
五、Summary of multi-task learning methods under BEV
FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras
StretchBEV: Stretching Future Instance Prediction Spatially and Temporally
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
M^2^BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation
STSU: Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Ego3RT: Learning Ego 3D Representation as Ray Tracing
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
PolarFormer: Multi-camera 3D Object Detection with Polar Transformers
六、PV2BEV method summary
1. Summary of PV2BEV based on depth method
OFT: Orthographic Feature Transform for Monocular 3D Object Detection
CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection
DSGN: Deep Stereo Geometry Network for 3D Object Detection
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
PanopticSeg: Bird’s-Eye-View Panoptic Segmentation Using Monocular Frontal View Images
FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
M^2^BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation
StretchBEV: Stretching Future Instance Prediction Spatially and Temporally
DfM: Monocular 3D Object Detection with Depth from Motion
BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones
Putting People in their Place: Monocular Regression of 3D People in Depth
2. Summary of PV2BEV methods based on Hough transform
IPM: Inverse perspective mapping simplifies optical flow computation and obstacle detection
DSM: Automatic Dense Visual Semantic Mapping from Street-Level Imagery
MapV: Learning to map vehicles into bird’s eye view
BridgeGAN: Generative Adversarial Frontal View to Bird View Synthesis
VPOE: Deep learning based vehicle position and orientation estimation via inverse perspective mapping image
3D-LaneNet: End-to-End 3D Multiple Lane Detection
The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping
Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View
MonoLayout: Amodal Scene Layout from a Single Image
MVNet: Multiview Detection with Feature Perspective Transformation
OGMs: Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning
TrafCam3D: Monocular 3D Vehicle Detection Using Uncalibrated Traffic Camerasthrough Homography
SHOT:Stacked Homography Transformations for Multi-View Pedestrian Detection
HomoLoss: Homography Loss for Monocular 3D Object Detection