OverView of 3D object detection method

To make it easier for me to keep track of the papers I've reviewed, I'll compile a list of those related to 3D object detection. This will encompass deep learning-based algorithms as well as multimodal fusion algorithms.

流程图 drawio

papper list

survey

Method Title Author
object detection Foreground-Background Imbalance Problem in Deep Object Detectors: A Review Joya Chen, Tong Xu
object detection A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving Di Feng,Ali Harakeh,Steven Waslander
object detection An Overview Of 3D Object Detection Yilin Wang, Jiayi Ye
object detection 3D Object Detection for Autonomous Driving: A Survey Rui Qian, Xin Lai
MultiModel Multi-Modal 3D Object Detection in Autonomous Driving: a Survey Yingjie Wang,Qiuyu Mao
MultiModel Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets,Methods, and Challenges Di Feng,Christian Haase-Schutz
MultiModel Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review Yaodong Cui

object detection without fusion

非融合算法时间序列 drawio

Method Title Input Pub. Author
Monocular based Deep3DBox: 3D Bounding Box Estimation Using Deep Learning and Geometry Monocular Image CVPR 2017 Chen et al.
Monocular based MonoCon : Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection Monocular Image arXiv 2021 Liu et al.
Monocular based Mono3D-PLiDAR : Monocular 3d object detection with pseudo-lidar point cloud Monocular Image ICCV 2019 Weng et al.
Monocular based M3DSSD: onocular 3D Single Stage Object Detector Monocular Image CVPR 2021 Luo et al.
Monocular based MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation Monocular Image CVPR 2021 chen et al.
Stereo based 3DOP: 3D Object Proposals using Stereo Imagery for Accurate Object Class Detection Monocular Image NIPS 2015 Chen et al.
Stereo based Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector Stereo Image CVPR 2021 Guo et al.
Stereo based CG-Stereo : Confidence guided stereo 3D object detection with split depth estimation Stereo Image IROS 2020 Li et al.
Stereo based Stereo R-CNN :Stereo R-CNN Based 3D Object Detection for Autonomous Driving Stereo Image CVPR 2019 Li et al.
MultiView based VeloFCN : Vehicle detection from 3d lidar using fully convolutional network Front View,FV CVPR 2016 Li et al.
MultiView based BirdNet : Birdnet: a 3d object detection framework from lidar information Bird’s Eye of View,BEV CVPR 2018 Jorge et al.
MultiView based Pixor: Real-time 3d object detection from point clouds BEV CVPR 2018 Yang et al.
MultiView based Hdnet: Exploiting hd maps for 3d object detection BEV PMLR 2018 Yang et al.
MultiView based LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving Range View,RV CVPR 2019 Meyer et al.
Voxel based Voxelnet: End-to-end learning for point cloud based 3d object detection voxel CVPR 2018 Zhou et al.
Voxel based Second: Sparsely embedded convolutional detection voxel Sensors 2018 Yan et al.
Voxel based PointPillars: Fast Encoders for Object Detection From Point Clouds voxel CVPR 2019 Lang et al.
Voxel based HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection voxel CVPR 2020 Ye et al.
Voxel based HVPR: Hybrid Voxel-Point Representation for Single-Stage 3D Object Detection voxel CVPR 2021 Noh et al.
Voxel based SA-SSD : Structure aware single-stage 3d object detection from point cloud voxel CVPR 2020 He et al.
Point based PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud point CVPR 2019 Shi et al.
Point based VoteNet :A Deep Learning Label Fusion Method for Multi-atlas Segmentation point ICCV 2019 Ding et al.
Point based Part A^2 :From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network point TPAMI2020 Shi et al.
Point based PV RCNN : Point-Voxel Feature Set Abstraction for 3D Object Detection point CVPR 2020 Shi et al.
Point based 3DSSD :Point-based 3D Single Stage Object Detector point CVPR 2020 Yang et al.
Point based LiDAR RCNN :An Efficient and Universal 3D Object Detector point CVPR 2021 Li et al.
Point based 3DIoUMatch :Leveraging IoU Prediction for Semi-Supervised 3D Object Detection point CVPR 2021 Wang et al.
Point based ST3D :Self-Training for Unsupervised Domain Adaptation on 3D Object Detection point CVPR 2021 Yang et al.

multimodel object detection

算法时间序列 drawio

Title Pub. Author
MV3D : Multi-View 3D Object Detection Network for Autonomous Driving CVPR 2017 chen et al.
AVOD : Joint 3D Proposal Generation and Object Detection from View Aggregation IROS 2018 Ku et al.
SCANet: Spatial-channel attention network for 3D object detection ICASSP 2019 Lu et al.
MVX-net: Multimodal voxelnet for 3d object detection ICRA 2019 Sindagi et al.
MMF : Multi-task multi-sensor fusion for 3d object detection CVPR 2019 liang et al.
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection IROS 2020 Peng et al.
ContFusion : Deep continuous fusion for multi-sensor 3d object detection ECCV 2018 Liang et al.
Pointfusion: Deep sensor fusion for 3d bounding box estimation CVPR 2018 Xu et al.
Pointpainting: Sequential fusion for 3d object detection CVPR 2020 Lang et al.
Epnet: Enhancing point features with image semantics for 3d object detection ECCV 2020 Huang et al.
PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module AAAI 2020 Xiang et al.
MoCa : Multi-Modality Cut and Paste for 3D Object Detection arXiv 2020 Zhang et al.
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection CVPR 2021 Wang et al.
Imvotenet: Boosting 3d object detection in point clouds with image votes CVPR 2020 Charles Qi et al.
Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving CVPR 2019 Wang et al.
Roarnet: A robust 3d object detection based on region approximation refinement IEEE.IV 2019 Shin et al.
Frustum PointNet : Frustum pointnets for 3d object detection from rgb-d data CVPR 2018 Qi et al.
Frustum ConvNet : Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection IROS 2019 Wang et al.

Selfsupervised Learning

Title Pub. Author
Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 2022 MetaAI
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding CVPR 2022 Mohamed Afham

Unsupervised Learning

Title Pub. Author
Unsupervised Learning of Depth from Monocular Videos Using 3D-2D Corresponding Constraints Remote Sensing 2021 Jin et al.
ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection CVPR 2021 Yang et al.

downsampling in pointcloud

Method Title
farthest point sampling(FPS) PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
farthest point sampling(FPS) ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics
grid sampling(GS) RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
grid sampling(GS) KPConv: Flexible and Deformable Convolution for Point Clouds
random sampling(RS) Grid-GCN for Fast and Scalable Point Cloud Learning
Critical Points Layer (CPL) Adaptive Hierarchical Down-Sampling for Point Cloud Classification
Weighted Critical Points Layer (WCPL) Adaptive Hierarchical Down-Sampling for Point Cloud Classification
Adaptive Sampling PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling
Feature-FPS (F-FPS) 3DSSD: Point-based 3D Single Stage Object Detector
Semantics-guided Farthest Point Sampling (S-FPS) SASA:SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection

Point Cloud Local Feature Description

Title Pub. Author
2D Shape Context: Shape Context: A new descriptor for shape matching and object recognition NeurIPS 2000 Serge Belongie et al.
3D Shape Context:Recognizing Objects in Range Data Using Regional Point Descriptors ECCV 2004 Andrea et al.
Shape Matching and Object Recognition Using Shape Contexts 2002 Belongie et al.
3D Shape Descriptor for Objects Recognition LARS and SBR 2017 Sales et al.
ROI-cloud: A Key Region Extraction Method for LiDAR Odometry and Localization ICRA 2020 Zhou et al.
PointSIFT: A sift-like network module for 3D point cloud semantic segmentation CVPR 2018 Jiang et al.

Cooperative Driving Automation

Title Pub. Author
V2X-ViT :V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer ECCV'22 Xu et al.
Where2comm :Communication-Efficient Collaborative Perception via Spatial Confidence Maps NeurIPS'22 Hu et al.
CoBEVT :Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers CoRL'22 Hu et al.
V2VNet :Vehicle-to-Vehicle Communication for Joint Perception and Prediction ECCV'20 Wang et al.
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication ECCV'22 Xu et al.
SyncNet:Latency-Aware Collaborative Perception ECCV'22 Lei et al.
CoAlign :Robust Collaborative 3D Object Detection in Presence of Pose Errors ICRA'22 Lu et al.
Double-M:Uncertainty Quantification of Collaborative Detection for Self-Driving ICAR'23 Su et al.
SCOPE: Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception ICCV'23 Yang et al.
MPDA: Bridging the Domain Gap for Multi-Agent Perception ICRA'23 Xu et al.
AdaFusion: Adaptive Feature Fusion for Cooperative Perception using LiDAR Point Clouds WACV'23 Qiao et al.
CoBEVFlow :Robust Asynchronous Collaborative 3D Detection via Bird’s Eye View Flow NeurIPS'23 Wei et al.
HAEL :An Extensible Framework for Open Heterogeneous Collaborative Perception ICLR 2024 Lu et al.
CoHFF :Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles CVPR 2024 Song et al.
CMiMC :What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception AAAI 2024 Su et al.
CharSim :Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration CVPR 2024 Highlight Wei et al.

DataSet

DataSet Size Categories / Remarks Sensing Modalities
ScanNet 1513 scans 2.5M frames floor, wall, chair, cabinet, bed, sofa, table, door, window, bookself, picture, counter, desk, curtain, refrigerator, shower curtain, toilet, sink, bathtub, other furniture 3D comera,deep Sensors
SUN RGB-D
SUN3D
KITTI 7481 frames (training) 80.256 objects Car, Van, Truck, Pedestrian, Person (sitting), Cyclist, Tram,Misc Visual (Stereo) camera, 3D LiDAR, GNSS, and inertial sensors
nuScense 1000 scenes, 1.4M frames (camera, Radar), 390k frames (3D LiDAR) 25 Object classes, such as Car /Van / SUV, different Trucks,Buses, Persons, Animal, Traffic Cone, Temporary Traffic Barrier, Debris, etc. Visual cameras (6), 3D LiDAR, and Radars (5)
BLVD 120k frames, 249,129 objects Vehicle, Pedestrian, Rider during day and night Visual (Stereo) camera, 3D LiDAR
Waymo open dataset 200k frames, 12M objects (3D LiDAR), 1.2M objects (2D camera) Vehicles, Pedestrians, Cyclists,Signs 3D LiDAR (5), Visual cameras (5)
H3D 27,721 frames, 1,071,302 objects Car, Pedestrian, Cyclist, Truck, Misc, Animals, Motorcyclist, Bus Visual cameras (3), 3D LiDAR
Lyft-L5 AV dataset 55k frames Semantic HD map included 3D LiDAR (5), Visual cameras (6)
A2D2 40k frames (semantics), 12k frames (3D objects), 390k frames unlabeled Car,Bicycle, Pedestrian, Truck,Small vehicles, Traffic signal,Utility vehicle, Sidebars, Speed bumper, Curbstone, Solid line,Irrelevant signs, Road blocks, Tractor, Non-drivable street, Zebra crossing, Obstacles / trash, Poles,RD restricted area, Animals, Grid structure, Signal corpus, Drivable cobbleston, Electronic traffic,Slow drive area, Nature object,Parking area, Sidewalk, Ego car,Painted driv. instr., Traffic guide obj., Dashed line, RD normal street, Sky, Buildings, Blurred area, Rain dirt Visual cameras (6); 3D LiDAR (5); Bus data
ApolloScape 143,906 image frames, 89,430 objects Rover, Sky, Car, Motobicycle,Bicycle, Person, Rider, Truck,Bus, Tricycle, Road, Sidewalk,Traffic Cone, Road Pile, Fence,Traffic Light, Pole, Traffic Sign,Wall, Dustbin, Billboard,Building, Bridge, Tunnel,Overpass, Vegetation Visual (Stereo) camera, 3D LiDAR, GNSS, and inertial sensors
A3D Dataset 39k frames, 230k objects Car, Van, Bus, Truck, Pedestrians,Cyclists, and Motorcyclists;Afternoon and night, wet and dry Visual cameras (2); 3D LiDAR
DBNet Dataset Over 10k frames In total seven datasets with different test scenarios, such as seaside roads, school areas,mountain roads. 3D LiDAR, Dashboard visual camera, GNSS
KAIST multispectral dataset 7,512 frames, 308,913 objects Person, Cyclist, Car during day and night, fine time slots (sunrise,afternoon,...)
PandaSet

Collaborative

DataSet Simulation
OPV2V Yes
V2V4Real No
V2XSet Yes
V2X-Sim Yes