This is the official implementation of the WACV 2023 PVL workshop paper Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds.
PyTorch >= 1.9 is recommended for a better support of the checkpoint technique. (or you can manually replace the interface of checkpoint in torch < 1.9 with the one in torch >= 1.9.)
The implementation builds upon code from SST, which in turn is based on MMDetection3D. Please refer to their getting_started for getting the environment up and running.
The training procedure is the same as the one in SST. Please refer to ./tools/train.py
or ./tools/dist_train.sh
for details.
In ./configs/sst_masked/
, we provide the configs used for pre-training. For instance, sst_nuscenes_2sweeps-remove_close-cpf-scaled_masked_200e_ED8.py
is the config used for pre-training the two sweep model, and sst_nuscenes_10sweeps-remove_close-cpf-scaled_masked_200e.py
is the config used for pre-training the ten sweep model. The config with suffix intensity.py
use intensity information. remove-close
refers to removal of points hitting the ego-vehicle. cpf
refers to using the three pre-training tasks (Chamfer, #points and "fake"/empty voxels). 200e
refers to the number of epochs used for pre-training.
For pre-training on varying fractions of the dataset, use configs with suffix fifths.py
. For instance, sst_nuscenes_2sweeps-remove_close-cpf-scaled_masked_200e_ED8_1fifths.py
is the config used for pre-training the two sweep model on the first fifth of the dataset.
For pre-training with a subset of the pre-training tasks, use variations of cpf
, e.g. cf
refers to using Chamfer and fake voxels, pf
refers to using #points and fake voxels, etc.
After pre-training, we can use the pre-trained checkpoints to initialize the 3D OD model. Again, training is started with tools/train.py
or tools/dist_train.sh
. However, to load the pre-trained weights, we need to use the --cfg-options
option with load_from
. For instance, tools/dist_train.sh $CONFIG $GPUS --cfg-options load_from=$PATH_TO_PRETRAINED_CHECKPOINT
. For training models from scratch, simply disregards the load_from
. For evaluation every 12th epoch one can use --cfg-options evaluation.metric=nuscenes
.
Configs for fine-tuning can be found in ./configs/sst_refactor
. We use sst_2sweeps_VS0.5_WS16_ED8_epochs288.py
and sst_10sweeps_VS0.5_WS16_ED8_epochs288.py
for training the two and ten sweep models. The config with suffix intensity.py
use intensity information. VS0.5
refers to voxel size of 0.5. WS16
refers to window size 16. ED8
refers to the encoder depth of 8.
Similar to pre-training configs, we provide versions for different dataset sizes, e.g. 1fiths
, 1twentieth
and 1hundreth
use 20%, 5% and 1% of the available data.
We provide a Dockerfile to build an image.
# build an image with PyTorch 1.9, CUDA 10.2
cd docker
bash build.sh
bash run.sh ENTER_YOUR_VOLUME_PASS
This project is based on the following codebases.