We open source e3d: Efficient Methods for 3D Deep Learning, a repository containing our recent advances in efficient 3D point cloud understanding.
[2020-09] [NEW!!] We release baseline training code for SPVCNNs and MinkowskiNets in spvnas repo, please have a look!
[2020-08] Please check out our ECCV 2020 tutorial on AutoML for Efficient 3D Deep Learning, which summarizes the methods released in this codebase. We also made the hands-on tutorial available in colab:
<a href="https://colab.research.google.com/github/mit-han-lab/e3d/blob/master/tutorial/e3d.ipynb" target="_parent"><img src="https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667" alt="Open In Colab" data-canonical-src="https://colab.research.google.com/assets/colab-badge.svg"></a>
[2020-07] Our paper Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution is accepted to ECCV 2020.
[2020-03] Our work PVCNN is deployed on MIT Driverless racing cars, please check of this video.
[2019-12] We give the spotlight talk of PVCNN at NeurIPS 2019.
Please run:
git clone https://github.com/mit-han-lab/e3d --recurse-submodules
to clone this code base. If you forget to add the —recursive-submodules
flag when cloning the codebase, please run:
git submodule update --init
after you run:
git clone https://github.com/mit-han-lab/e3d
To use all the codebases presented in this repository, please following the instructions in each folder.
[Tutorial at ECCV NAS Workshop] [ECCV 10-min Talk] [MIT News] [State-of-the-Art on SemanticKITTI Leaderboard]
@inproceedings{
tang2020searching,
title = {Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution},
author = {Tang, Haotian* and Liu, Zhijian* and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
booktitle = {European Conference on Computer Vision},
year = {2020}
}
We release the PyTorch code of our paper SPVNAS: Searching Efficient 3D Architectures with Sparse Point Voxel Convolution (arXiv version). It achieves state-of-the-art performance on the SemanticKITTI leaderboard, and outperforms MinkowskiNet with 3x speedup, 8x MACs reduction.
- Prerequisites
- Data Preparation
- SemanticKITTI
- Code
- Pretrained Models
- SemanticKITTI
- Testing Pretrained Models
- Visualizations
- Training
- Searching
The code is built with following libraries:
- Python >= 3.6
- PyTorch >= 1.6
- tensorboardX >= 1.2
- tqdm
- torchpack
- torchsparse
Please follow the instructions from here to download the SemanticKITTI dataset (both KITTI Odometry dataset and SemanticKITTI labels) and extract all the files in the sequences
folder to /dataset/semantic-kitti
. You shall see 22 folders 00, 01, …, 21; each with subfolders named velodyne
and labels
.
The code (under the spvnas
folder) is based on torchsparse, a high-performance GPU computing library for 3D sparse convolution operations. It is significantly faster than existing implementation MinkowskiEngine and supports more diverse operations, such as the new 3D module proposed in this paper, Sparse Point-Voxel Convolution, or in short SPVConv (see spvnas/core/models/semantic_kitti/spvcnn.py for details):
# x: sparse tensor, z: point_tensor
x_new = point_to_voxel(x, z)
x_new = sparse_conv_net(x_new)
z_ew = voxel_to_point(x_new, z) + point_transforms(z.F)
We further propose 3D-NAS to automatically search for efficient 3D architectures built with SPVConv. The 3D-NAS super network implementation can be found in spvnas/core/models/semantic_kitti/spvnas.py.
We share the pretrained models for MinkowskiNets, our manually designed SPVCNN models and also SPVNAS models found by our 3D-NAS pipeline. All the pretrained models are available in the Model Zoo. Currently, we release the models trained on sequences 00-07 and 09-10 and evaluated on sequence 08.
Models | #Params (M) | MACs (G) | mIoU (paper) | mIoU (reprod.) |
---|---|---|---|---|
SemanticKITTI_val_MinkUNet@29GMACs | 5.5 | 28.5 | 58.9 | 59.3 |
SemanticKITTI_val_SPVCNN@30GMACs | 5.5 | 30.0 | 60.7 | 60.8 ± 0.5 |
SemanticKITTI_val_SPVNAS@20GMACs | 3.3 | 20.0 | 61.5 | - |
SemanticKITTI_val_SPVNAS@25GMACs | 4.5 | 24.6 | 62.9 | - |
SemanticKITTI_val_MinkUNet@46GMACs | 8.8 | 45.9 | 60.3 | 60.0 |
SemanticKITTI_val_SPVCNN@47GMACs | 8.8 | 47.4 | 61.4 | 61.5 ± 0.2 |
SemanticKITTI_val_SPVNAS@35GMACs | 7.0 | 34.7 | 63.5 | - |
SemanticKITTI_val_MinkUNet@114GMACs | 21.7 | 113.9 | 61.1 | 61.9 |
SemanticKITTI_val_SPVCNN@119GMACs | 21.8 | 118.6 | 63.8 | 63.7 ± 0.4 |
SemanticKITTI_val_SPVNAS@65GMACs | 10.8 | 64.5 | 64.7 | - |
Here, the results are reproduced using 8 NVIDIA RTX 2080Ti GPUs. Result variation for each single model is due to the existence of floating point atomic addition operation in our torchsparse CUDA backend.
After cd spvnas
, you can run the following command to test the performance of SPVNAS / SPVCNN / MinkUNet models.
torchpack dist-run -np [num_of_gpus] python evaluate.py configs/semantic_kitti/default.yaml --name [num_of_net]
For example, to test the model SemanticKITTI_val_SPVNAS@65GMACs on one GPU, you may run
torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs
After cd spvnas
, you can run the following command (on a headless server) to visualize the predictions of SPVNAS / SPVCNN / MinkUNet models:
xvfb-run --server-args="-screen 0 1024x768x24" python visualize.py
If you are running the code on a computer with monitor, you may also directly run
python visualize.py
The visualizations will be generated in sample_data/outputs
.
We currently release the training code for manually-designed baseline models (SPVCNN and MinkowskiNets). You may run the following code after cd spvnas
to train the model from scratch:
torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/[model name]/[config name].yaml
For example, to train the model SemanticKITTI_val_SPVCNN@30GMACs, you may run
torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml
The code related to architecture search will be coming soon!
@inproceedings{liu2019pvcnn,
title={Point-Voxel CNN for Efficient 3D Deep Learning},
author={Liu, Zhijian and Tang, Haotian and Lin, Yujun and Han, Song},
booktitle={Advances in Neural Information Processing Systems},
year={2019}
}
[Paper] [NeurIPS 2019 spotlight talk] [Deploy on MIT Driverless] [NVIDIA Jetson Community Project Spotlight] [Playlist] [Website]
In PVCNN, we present a new efficient 3D deep learning module, Point-Voxel Convolution (PVConv) as is illustrated below.
PVConv takes advantage of the regularity of volumetric representation and small footprint of point cloud representation, achieving significantly faster inference speed and much lower memory footprint comparing with both point cloud-based and voxel-based 3D deep learning methods.
Here is a demo comparing PVCNN and PointNet in 3D shape part segmentation on NVIDIA Jetson Nano:
To test the PVCNN models, please run cd pvcnn
first and download our pretrained models as is indicated in the README file. Then, please run this code template
python train.py [config-file] --devices [gpu-ids] --evaluate --configs.evaluate.best_checkpoint_path [path to the model checkpoint]
to do the evaluation. If you want to do inference on S3DIS with GPU 0,1, you can run:
python train.py configs/s3dis/pvcnn/area5.py --devices 0,1 --evaluate --configs.evaluate.best_checkpoint_path s3dis.pvcnn.area5.c1.pth.tar