Currently 1st place in KITTI BEV and 3rd in KITTI 3D. The detector can run at 25 FPS.
Authors: Chenhang He, Zeng Hui, Jianqiang Huang, Xiansheng Hua, Lei Zhang.
Current single-stage detectors are efficient by progressively downscaling the 3D point clouds in a fully convolutional manner. However, the downscaled features inevitably lose spatial information and cannot make full use of the structure information of 3D point cloud, degrading their localization precision. In this work, we propose to improve the localization precision of single-stage detectors by explicitly leveraging the structure information of 3D point cloud. Specifically, we design an auxiliary network which converts the convolutional features in the backbone network back to point-level representations. The auxiliary network is jointly optimized, by two point-level supervisions, to guide the convolutional features in the backbone network to be aware of the object structure. The auxiliary network can be detached after training and therefore introduces no extra computation in the inference stage. Besides, considering that single-stage detectors suffer from the discordance between the predicted bounding boxes and corresponding classification confidences, we develop an efficient part-sensitive warping operation to align the confidences to the predicted bounding boxes.
python3.5+
pytorch
(tested on 1.1.0)opencv
shapely
mayavi
spconv
- Clone this repository.
- Compile C++/CUDA modules in mmdet/ops by running the following command at each directory, e.g.
$ cd mmdet/ops/points_op
$ python3 setup.py build_ext --inplace
- Setup following Environment variables, you may add them to ~/.bashrc:
export NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so
export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice
export LD_LIBRARY_PATH=/home/billyhe/anaconda3/lib/python3.7/site-packages/spconv;
-
Download the 3D KITTI detection dataset from here. Data to download include:
- Velodyne point clouds (29 GB): input data to VoxelNet
- Training labels of object data set (5 MB): input label to VoxelNet
- Camera calibration matrices of object data set (16 MB): for visualization of predictions
- Left color images of object data set (12 GB): for visualization of predictions
-
Create cropped point cloud and sample pool for data augmentation, please refer to SECOND.
$ python3 tools/create_data.py
- Split the training set into training and validation set according to the protocol here.
└── DATA_DIR
├── training <-- training data
| ├── image_2
| ├── label_2
| ├── velodyne
| └── velodyne_reduced
└── testing <--- testing data
| ├── image_2
| ├── label_2
| ├── velodyne
| └── velodyne_reduced
You can download the pretrained model here, which is trained on the train split (3712 samples) and evaluated on the val split (3769 samples) and test split (7518 samples). The performance (using 40 recall poisitions) on validation set is as follows:
Car AP@0.70, 0.70, 0.70:
bbox AP:99.12, 96.09, 93.61
bev AP:96.55, 92.79, 90.32
3d AP:93.13, 84.54, 81.71
To train the SA-SSD with single GPU, run the following command:
cd mmdet/tools
python3 train.py ../configs/car_cfg.py
To train the SA-SSD with multiple GPUs, run the following command:
bash dist_train.sh
To evaluate the model, run the following command:
cd mmdet/tools
python3 test.py ../configs/car_cfg.py ../saved_model_vehicle/epoch_50.pth
If you find this work useful in your research, please consider cite:
@inproceedings{he2020sassd,
title={Structure Aware Single-stage 3D Object Detection from Point Cloud},
author={He, Chenhang and Zeng, Hui and Huang, Jianqiang and Hua, Xian-Sheng and Zhang, Lei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2020}
}
The code is devloped based on mmdetection, some part of codes are borrowed from SECOND and PointRCNN.