/Frustum-Pointpillars

Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection using RGB Camera and LiDAR

Primary LanguagePythonMIT LicenseMIT

Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection using RGB Camera and LiDAR

Authors: Anshul Paigwar, David Sierra-Gonzalez, Ozgur Erkent, Christian Laugier

drawingdrawing

Introduction

This repository is code release for our GndNet paper published in IEEE International Conference of Computer Vision, ICCV'2021, Workshop on Autonomous Vehicle Vision. Link

Abstract

Accurate 3D object detection is a key part of the perception module for autonomous vehicles. A better understanding of the objects in 3D facilitates better decision-making and path planning. RGB Cameras and LiDAR are the most commonly used sensors in autonomous vehicles for environment perception. Many approaches have shown promising results for 2D detection with RGB Images, but efficiently localizing small objects like pedestrians in the 3D point cloud of large scenes has remained a challenging area of research. We propose a novel method, Frustum-PointPillars, for 3D object detection using LiDAR data. Instead of solely relying on point cloud features, we leverage the mature field of 2D object detection to reduce the search space in the 3D space. Then, we use the Pillar Feature Encoding network for object localization in the reduced point cloud. We also propose a novel approach for masking point clouds to further improve the localization of objects. We train our network on the KITTI dataset and perform experiments to show the effectiveness of our network. On the KITTI test set our method outperforms other multi-sensor SOTA approaches for 3D pedestrian localization (Bird’s Eye View) while achieving a significantly faster runtime of 14 Hz.

drawing

Getting Started

We would like to thank authors of PointPillars and SECOND detector. This repository is forked from nutonomy PointPillars and SECOND for KITTI object detection.

Code Support

ONLY supports python 3.6+, pytorch 1.4 +. Code has only been tested on Ubuntu 18.04.

Install

1. Clone code

git clone https://github.com/anshulpaigwar/Frustum-Pointpillars.git

2. Install Python packages

You can use pip or Anaconda package manager to install following packages.

pip install --upgrade pip
pip install fire tensorboardX shapely pybind11 protobuf scikit-image numba pillow sparsehash

Finally, install SparseConvNet. This is not required for PointPillars, but the general SECOND code base expects this to be correctly configured.

git clone git@github.com:facebookresearch/SparseConvNet.git
cd SparseConvNet/
bash build.sh
# NOTE: if bash build.sh fails, try bash develop.sh instead

Additionally, you may need to install Boost geometry:

sudo apt-get install libboost-all-dev

4. PYTHONPATH

Add Frustum-PointPillars/ to your PYTHONPATH.

Prepare dataset

1. Dataset preparation

Download KITTI dataset and create some directories first:

└── KITTI_DATASET_ROOT
       ├── training    <-- 7481 train data
       |   ├── image_2 <-- for visualization
       |   ├── calib
       |   ├── label_2
       |   ├── velodyne
       |   └── velodyne_reduced <-- empty directory
       └── testing     <-- 7580 test data
           ├── image_2 <-- for visualization
           ├── calib
           ├── velodyne
           └── velodyne_reduced <-- empty directory

Note: PointPillar's protos use KITTI_DATASET_ROOT=/data/sets/kitti_second/.

2. Create kitti infos:

python create_data.py create_kitti_info_file --data_path=KITTI_DATASET_ROOT

3. Create reduced point cloud:

python create_data.py create_reduced_point_cloud --data_path=KITTI_DATASET_ROOT

4. Create groundtruth-database infos:

python create_data.py create_groundtruth_database --data_path=KITTI_DATASET_ROOT

5. Modify config file

The config file needs to be edited to point to the above datasets:

train_input_reader: {
  ...
  database_sampler {
    database_info_path: "/path/to/kitti_dbinfos_train.pkl"
    ...
  }
  kitti_info_path: "/path/to/kitti_infos_train.pkl"
  kitti_root_path: "KITTI_DATASET_ROOT"
}
...
eval_input_reader: {
  ...
  kitti_info_path: "/path/to/kitti_infos_val.pkl"
  kitti_root_path: "KITTI_DATASET_ROOT"
}

Train

cd ~/second.pytorch/second
python ./pytorch/train.py train --config_path=./configs/pointpillars/car/xyres_16.proto --model_dir=/path/to/model_dir
  • If you want to train a new model, make sure "/path/to/model_dir" doesn't exist.
  • If "/path/to/model_dir" does exist, training will be resumed from the last checkpoint.
  • Training only supports a single GPU.
  • Training uses a batchsize=2 which should fit in memory on most standard GPUs.
  • On a single 1080Ti, training xyres_16 requires approximately 20 hours for 160 epochs.

Evaluate

cd ~/second.pytorch/second/
python pytorch/train.py evaluate --config_path= configs/pointpillars/car/xyres_16.proto --model_dir=/path/to/model_dir
  • Detection result will saved in model_dir/eval_results/step_xxx.
  • By default, results are stored as a result.pkl file. To save as official KITTI label format use --pickle_result=False.

Results

drawing

drawing

ICCV workshop presentation: https://www.youtube.com/watch?v=0z7OPPRsqTk

Citation

If you find this project useful in your research, please star this GitHub repository and consider citing our work:

@INPROCEEDINGS{9607424,
  author={Paigwar, Anshul and Sierra-Gonzalez, David and Erkent, Özgür and Laugier, Christian},
  booktitle={2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)}, 
  title={Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection using RGB Camera and LiDAR}, 
  year={2021},
  pages={2926-2933},
  doi={10.1109/ICCVW54120.2021.00327}}
}

Contribution

We welcome you for contributing to this repo, and feel free to contact us for any potential bugs and issues.

References

[1] Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[2] Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12697-12705).