/e3d

Efficient Methods for 3D Deep Learning

Primary LanguagePythonMIT LicenseMIT

e3d: Efficient Methods for 3D Deep Learning

We open source e3d: Efficient Methods for 3D Deep Learning, a repository containing our recent advances in efficient 3D point cloud understanding.

News

[2020-09] [NEW!!] We release baseline training code for SPVCNNs and MinkowskiNets in spvnas repo, please have a look!

[2020-08] Please check out our ECCV 2020 tutorial on AutoML for Efficient 3D Deep Learning, which summarizes the methods released in this codebase. We also made the hands-on tutorial available in colab:

<a href="https://colab.research.google.com/github/mit-han-lab/e3d/blob/master/tutorial/e3d.ipynb" target="_parent"><img src="https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667" alt="Open In Colab" data-canonical-src="https://colab.research.google.com/assets/colab-badge.svg"></a> 

[2020-07] Our paper Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution is accepted to ECCV 2020.

[2020-03] Our work PVCNN is deployed on MIT Driverless racing cars, please check of this video.

[2019-12] We give the spotlight talk of PVCNN at NeurIPS 2019.

Content

Installation

Please run:

git clone https://github.com/mit-han-lab/e3d --recurse-submodules

to clone this code base. If you forget to add the —recursive-submodules flag when cloning the codebase, please run:

git submodule update --init

after you run:

git clone https://github.com/mit-han-lab/e3d

To use all the codebases presented in this repository, please following the instructions in each folder.

SPVNAS: Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

[Tutorial at ECCV NAS Workshop] [ECCV 10-min Talk] [MIT News] [State-of-the-Art on SemanticKITTI Leaderboard]

@inproceedings{
  tang2020searching,
  title = {Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution},
  author = {Tang, Haotian* and Liu, Zhijian* and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
  booktitle = {European Conference on Computer Vision},
  year = {2020}
}

Overview

We release the PyTorch code of our paper SPVNAS: Searching Efficient 3D Architectures with Sparse Point Voxel Convolution (arXiv version). It achieves state-of-the-art performance on the SemanticKITTI leaderboard, and outperforms MinkowskiNet with 3x speedup, 8x MACs reduction.

SPVNAS Uniformly Outperforms MinkowskiNet

SPVNAS Achieves Lower Error on Safety Critical Small Objects

SPVNAS is Much Faster Than MinkowskiNet

Content

Prerequisites

The code is built with following libraries:

Data Preparation

SemanticKITTI

Please follow the instructions from here to download the SemanticKITTI dataset (both KITTI Odometry dataset and SemanticKITTI labels) and extract all the files in the sequences folder to /dataset/semantic-kitti. You shall see 22 folders 00, 01, …, 21; each with subfolders named velodyne and labels.

Code

The code (under the spvnas folder) is based on torchsparse, a high-performance GPU computing library for 3D sparse convolution operations. It is significantly faster than existing implementation MinkowskiEngine and supports more diverse operations, such as the new 3D module proposed in this paper, Sparse Point-Voxel Convolution, or in short SPVConv (see spvnas/core/models/semantic_kitti/spvcnn.py for details):

# x: sparse tensor, z: point_tensor
x_new = point_to_voxel(x, z)
x_new = sparse_conv_net(x_new)
z_ew = voxel_to_point(x_new, z) + point_transforms(z.F)

We further propose 3D-NAS to automatically search for efficient 3D architectures built with SPVConv. The 3D-NAS super network implementation can be found in spvnas/core/models/semantic_kitti/spvnas.py.

Pretrained Models

SemanticKITTI

We share the pretrained models for MinkowskiNets, our manually designed SPVCNN models and also SPVNAS models found by our 3D-NAS pipeline. All the pretrained models are available in the Model Zoo. Currently, we release the models trained on sequences 00-07 and 09-10 and evaluated on sequence 08.

Models #Params (M) MACs (G) mIoU (paper) mIoU (reprod.)
SemanticKITTI_val_MinkUNet@29GMACs 5.5 28.5 58.9 59.3
SemanticKITTI_val_SPVCNN@30GMACs 5.5 30.0 60.7 60.8 ± 0.5
SemanticKITTI_val_SPVNAS@20GMACs 3.3 20.0 61.5 -
SemanticKITTI_val_SPVNAS@25GMACs 4.5 24.6 62.9 -
SemanticKITTI_val_MinkUNet@46GMACs 8.8 45.9 60.3 60.0
SemanticKITTI_val_SPVCNN@47GMACs 8.8 47.4 61.4 61.5 ± 0.2
SemanticKITTI_val_SPVNAS@35GMACs 7.0 34.7 63.5 -
SemanticKITTI_val_MinkUNet@114GMACs 21.7 113.9 61.1 61.9
SemanticKITTI_val_SPVCNN@119GMACs 21.8 118.6 63.8 63.7 ± 0.4
SemanticKITTI_val_SPVNAS@65GMACs 10.8 64.5 64.7 -

Here, the results are reproduced using 8 NVIDIA RTX 2080Ti GPUs. Result variation for each single model is due to the existence of floating point atomic addition operation in our torchsparse CUDA backend.

Testing Pretrained Models

After cd spvnas, you can run the following command to test the performance of SPVNAS / SPVCNN / MinkUNet models.

torchpack dist-run -np [num_of_gpus] python evaluate.py configs/semantic_kitti/default.yaml --name [num_of_net]

For example, to test the model SemanticKITTI_val_SPVNAS@65GMACs on one GPU, you may run

torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs

Visualizations

After cd spvnas, you can run the following command (on a headless server) to visualize the predictions of SPVNAS / SPVCNN / MinkUNet models:

xvfb-run --server-args="-screen 0 1024x768x24" python visualize.py

If you are running the code on a computer with monitor, you may also directly run

python visualize.py

The visualizations will be generated in sample_data/outputs.

Training

SemanticKITTI

We currently release the training code for manually-designed baseline models (SPVCNN and MinkowskiNets). You may run the following code after cd spvnas to train the model from scratch:

torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/[model name]/[config name].yaml

For example, to train the model SemanticKITTI_val_SPVCNN@30GMACs, you may run

torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml

Searching

The code related to architecture search will be coming soon!

PVCNN

@inproceedings{liu2019pvcnn,
  title={Point-Voxel CNN for Efficient 3D Deep Learning},
  author={Liu, Zhijian and Tang, Haotian and Lin, Yujun and Han, Song},
  booktitle={Advances in Neural Information Processing Systems},
  year={2019}
}

[Paper] [NeurIPS 2019 spotlight talk] [Deploy on MIT Driverless] [NVIDIA Jetson Community Project Spotlight] [Playlist] [Website]

Overview

In PVCNN, we present a new efficient 3D deep learning module, Point-Voxel Convolution (PVConv) as is illustrated below.

PVConv takes advantage of the regularity of volumetric representation and small footprint of point cloud representation, achieving significantly faster inference speed and much lower memory footprint comparing with both point cloud-based and voxel-based 3D deep learning methods.

Here is a demo comparing PVCNN and PointNet in 3D shape part segmentation on NVIDIA Jetson Nano:

Evaluation

To test the PVCNN models, please run cd pvcnn first and download our pretrained models as is indicated in the README file. Then, please run this code template

python train.py [config-file] --devices [gpu-ids] --evaluate --configs.evaluate.best_checkpoint_path [path to the model checkpoint]

to do the evaluation. If you want to do inference on S3DIS with GPU 0,1, you can run:

python train.py configs/s3dis/pvcnn/area5.py --devices 0,1 --evaluate --configs.evaluate.best_checkpoint_path s3dis.pvcnn.area5.c1.pth.tar