Self-Supervised Pretraining of 3D Features on any Point-Cloud

This code provides a PyTorch implementation and pretrained models for DepthContrast, as described in the paper Self-Supervised Pretraining of 3D Features on any Point-Cloud.

DepthContrast is an easy to implement self-supervised method that works across model architectures, input data formats, indoor/outdoor 3D, single/multi-view 3D data. Similarly to 2D contrastive approaches, DepthContrast learns representations by comparing transformations of a 3D pointcloud/voxel. It does not require any multi-view information between frames, such as point-to-point correspondances. It makes our framework generalize to any 3D pointcloud or voxel input. DepthContrast pretrains high capacity models for 3D recognition tasks, and leverages large-scale 3D data. It shows state-of-the-art performance on detection and segmentation benchmarks, outperforming all prior work on detection.

Model Zoo

We release our PointNet++ and MinkowskiEngine UNet models pretrained with DepthContrast with the hope that other researchers might also benefit from these pretrained backbones. Due to license issue, models pretrained on Waymo cannot be released. For PointnetMSG and Spconv-UNet models, we encourage the researchers to train by themselves using the provided script.

We first provide PointNet++ models with different sizes.

network	epochs	batch-size	ScanNet Det with VoteNet	url	args
PointNet++-1x	150	1024	61.9	model	config
PointNet++-2x	200	1024	63.3	model	config
PointNet++-3x	150	1024	64.1	model	config
PointNet++-4x	100	1024	63.8	model	config

The ScanNet detection evaluation metric is mAP at IOU=0.25. You need to change the scale parameter in the config files accordingly.

We provide the joint training results here, with different epochs. We use epoch 400 to generate the results reported in the paper.

Backbone	epochs	batch-size	url	args
PointNet++ & MinkowskiEngine UNet	300	1024	model	config
PointNet++ & MinkowskiEngine UNet	400	1024	model	config
PointNet++ & MinkowskiEngine UNet	500	1024	model	config
PointNet++ & MinkowskiEngine UNet	600	1024	model	config
PointNet++ & MinkowskiEngine UNet	700	1024	model	config

Running DepthContrast unsupervised training

Requirements

You can use the requirements.txt to setup the environment. You can do:

pip install -r requirements.txt

conda install --file requirements.txt

For voxel representation, you have to install MinkowskiEngine. Please see here on how to install it.

For the lidar point cloud pretraining, we use models from OpenPCDet. To install OpenPCDet, you need to install spconv, which is a bit difficult to install and may not be compatible with MinkowskiEngine. Thus, we suggest you use a different conda environment for lidar point cloud pretraining.

Singlenode training

DepthContrast is very simple to implement and experiment with.

To experiment with it on one GPU, you can simply do:

python main.py /path/to/cfg/file

For multi-gpu training in one node, you can run:

python main.py /path/to/cfg_file --multiprocessing-distributed --world-size 1 --rank 0 --ngpus number_of_gpus

For submitting it to a slurm node, you can use ./scripts/pretrain_node1.sh. For hyper-parameter tuning, please change the config files.

Multinode training

Distributed training is available via Slurm. We provide several SBATCH scripts to reproduce our results. For example, to train DepthContrast on 4 nodes and 32 GPUs with a batch size of 1024 run:

sbatch ./scripts/pretrain_node4.sh /path/to/cfg_file

Note that you might need to remove the copyright header from the sbatch file to launch it.

Evaluating models

For votenet finetuning, please checkout this repo for more details.

For H3DNet finetuning, please checkout this repo for more details.

For voxel scene segmentation task finetuning, please checkout this repo for more details.

For lidar point cloud object detection task finetuning, please checkout this repo for more details.

Common Issues

For help or issues using DepthContrast, please submit a GitHub issue.

License

See the LICENSE file for more details.

Citation

If you find this repository useful in your research, please cite:

@inproceedings{zhang_depth_contrast,
  title={Self-Supervised Pretraining of 3D Features on any Point-Cloud},
  author={Zhang, Zaiwei and Girdhar, Rohit and Joulin, Armand and Misra, Ishan},
  journal={arXiv preprint arXiv:2101.02691},
  year={2021}
}

smartcai/DepthContrast