Implementation of VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection in tensorflow 2.3.1.
This project is inspired by the article of Gopalakrishna Adusumilli and the work of David Stephane.
- Modeling Techniques: VoxelNet, Lidar 3D Object Detection, PointNet, Convolutional autoencoder-decoder.
- Image Processing Techniques: Lidar data, Point cloud.
- Tech Stack: Python (3.7)
- Libraries: Tensorflow (2.3.1), opencv, numba.
- Clone this repository
- Compile the Cython module
$ python3 setup build_ext --inplace
Data preparation (Please refer to Notebook)
Here we used the Kitti Vision Dataset.
-
Download the 3D KITTI detection dataset from here. Data to download include:
- Velodyne point clouds (29 GB): input data to VoxelNet
- Training labels of object data set (5 MB): input label to VoxelNet
- Camera calibration matrices of object data set (16 MB): for visualization of predictions
- Left color images of object data set (12 GB): for visualization of predictions
-
In this project, we use the cropped point cloud data for training and validation. Point clouds outside the image coordinates are removed. Update the directories in
data/crop.py
and rundata/crop.py
to generate cropped data. Note that cropped point cloud data will overwrite raw point cloud data. -
Split the training set into training and validation set according to the protocol here. And rearrange the folders to have the following structure:
└── DATA_DIR
├── training <-- training data
| ├── image_2
| ├── label_2
| └── velodyne
└── validation <--- evaluation data
| ├── image_2
| ├── label_2
| └── velodyne
Train (Please refer to Notebook)
Run train.py
. You can find the meaning of each hyperparameter in the script file.
$ !python train.py \
--strategy="all" \
--n_epochs=160 \
--batch_size=2 \
--learning_rate=0.001 \
--small_addon_for_BCE=1e-6 \
--max_gradient_norm=5 \
--alpha_bce=1.5 \
--beta_bce=1 \
--huber_delta=3 \
--dump_vis="no" \
--data_root_dir="../DATA_DIR/T_DATA" \
--model_dir="model" \
--model_name="model6" \
--dump_test_interval=40 \
--summary_interval=10 \
--summary_val_interval=10 \
--summary_flush_interval=20 \
--ckpt_max_keep=10 \
- Run
predict.py
.
!python predict.py \
--strategy="all" \
--batch_size=2 \
--dump_vis="yes" \
--data_root_dir="../DATA_DIR/T_DATA/" \
--dataset_to_test="validation" \
--model_dir="model" \
--model_name="model6" \
--ckpt_name="" \
- Then, run the kitty_eval project to compute the performances of the model.
./kitti_eval/evaluate_object_3d_offline [DATA_DIR]/validation/label_2 ./predictions [output file]
In this case of 3D Object segmentation both Classification loss & Regression loss were used as the metric of performance. Here is the snapshot of the log dir visualization with Tensorboard.
The predicted bounding boxes are decent. The model was trained only with 100 images for 16 epoch, the prediction quality will improve a lot when trained with more number of images.
Happy Learning!