3DTrans: Autonomous Driving Transfer Learning Codebase

3DTrans is a lightweight, simple, self-contained open-source codebase for exploring the Autonomous Driving-oriented Transfer Learning Techniques, which mainly consists of four functions at present:

Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
Active Domain Adaptation (ADA) for 3D Point Clouds
Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
Multi-dateset Domain Fusion (MDF) for 3D Point Clouds

This project is developed and maintained by Autonomous Driving Group [at] Shanghai AI Laboratory (ADLab).

Overview

Update News
TODO List
3DTrans Framework Introduction
- What does 3DTrans Codebase do?
Installation for 3DTrans
Getting Started
Model Zoo:
Point Cloud Pre-training for Autonomous Driving Task
Visualization Tools for 3DTrans
Acknowledge
Citation

🔥 News 🔥

Two papers developed using our 3DTrans were accepted by CVPR-2023: Uni3D and Bi3D. (updated on Mar. 2023).
Our 3DTrans supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022).
Our 3DTrans supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022).
Our 3DTrans supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer.
Our 3DTrans supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022).
Our 3DTrans supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022).
The self-learning for unlabeled point clouds and new data augmentation operations (such as Object Scaling (ROS) and Size Normalization (SN)) has been supported for 3DTrans, which can achieve a performance boost based on the dataset prior of object-size statistics
Our 3DTrans was initialized based on the OpenPCD-v0.5.2 version (updated on May 2022).

🚀 We are actively updating this repository currently, and more cross-dataset fusion solutions (including domain attention and mixture-of-experts) and more low-cost data sampling strategy will be supported by 3DTrans in the furture, which aims to boost the generalization ability and adaptability of the existing state-of-the-art models. 🚀

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. 🗼

💪 TODO List 💪

For ADA module, need to add the sequence-level data selection policy (to meet the requirement of practical annotation process).
Provide experimental findings for the AD-related 3D pre-training (Our ongoing research, which currently achieves promising pre-training results towards downstream tasks by exploiting large-scale unlabeled data in ONCE dataset using 3DTrans).

3DTrans Framework Introduction

What does `3DTrans` Codebase do?

Rapid Target-domain Adaptation: 3DTrans can boost the 3D perception model's adaptability for an unseen domain only using unlabeled data. For example, ST3D supported by 3DTrans repository achieves a new state-of-the-art model transfer performance for many adaptation tasks, further boosting the transferability of detection models. Besides, 3DTrans develops many new UDA techniques to solve different types of domain shifts (such as LiDAR-induced shifts or object-size-induced shifts), which includes Pre-SN, Post-SN, and range-map downsampling retraining.
Annotation-saving Target-domain Transfer: 3DTrans can select the most informative subset of an unseen domain and label them at a minimum cost. For example, 3DTrans develops the Bi3D, which selects partial-yet-important target data and labels them at a minimum cost, to achieve a good trade-off between high performance and low annotation cost. Besides, 3DTrans has integrated several typical transfer learning techniques into the 3D object detection pipeline. For example, we integrate the TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher for supporting autonomous driving-related model transfer.
Joint Training on Multiple 3D Datasets: 3DTrans can perform the multi-dataset 3D object detection task. For example, 3DTrans develops the Uni3D for multi-dataset 3D object detection, which enables the current 3D baseline models to effectively learn from multiple off-the-shelf 3D datasets, boosting the reusability of 3D data from different autonomous driving manufacturers.
Multi-dataset Support: 3DTrans provides a unified interface of dataloader, data augmentor, and data processor for multiple public benchmarks, including Waymo, nuScenes, ONCE, Lyft, and KITTI, etc, which is beneficial to study the transferability and generality of 3D perception models among different datasets. Besides, in order to eliminate the domain gaps between different manufacturers and obtain generalizable representations, 3DTrans has integrated typical unlabeled pre-training techniques for giving a better parameter initialization of the current 3D baseline models. For example, we integrate the PointContrast and SESS to support point cloud-based pre-training task.
Extensibility for Multiple Models: 3DTrans makes the baseline model have the ability of cross-domain/dataset safe transfer and multi-dataset joint training. Without making major changes of the code and 3D model structure, a single-dataset 3D baseline model can be successfully adapted to an unseen domain or dataset by using our 3DTrans.
3DTrans is developed based on OpenPCDet codebase, which can be easily integrated with the models developing using OpenPCDet repository. Thanks for their valuable open-sourcing!

The relations of different modules (UDA, ADA, SSDA, MDF, Pre-training) in 3DTrans: The basic model uses UDA/ADA/SSDA (target-oriented adaptation techniques to alleviate inter-domain shift and perform the inference). Then, MDF can use the results of multi-source domains for pre-training to generate a unified dataset. Finally, pre-training task are utilized to learn generalizable representations from multi-source domains, which can effectively boost the performance of basic model.

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.
Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.
Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.
Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.
Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.

Model Zoo:

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

UDA Results:

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
For Waymo dataset training, we train the model using 20% data.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.

	training time	Adaptation	Car@R40	download
PointPillar	~7.1 hours	Source-only with SN	74.98 / 49.31	-
PointPillar	~0.6 hours	Pre-SN	81.71 / 57.11	model-57M
PV-RCNN	~23 hours	Source-only with SN	69.92 / 60.17	-
PV-RCNN	~23 hours	Source-only	74.42 / 40.35	-
PV-RCNN	~3.5 hours	Pre-SN	84.00 / 74.57	model-156M
PV-RCNN	~1 hours	Post-SN	84.94 / 75.20	model-156M
Voxel R-CNN	~16 hours	Source-only with SN	75.83 / 55.50	-
Voxel R-CNN	~16 hours	Source-only	64.88 / 19.90	-
Voxel R-CNN	~2.5 hours	Pre-SN	82.56 / 67.32	model-201M
Voxel R-CNN	~2.2 hours	Post-SN	85.44 / 76.78	model-201M
PV-RCNN++	~20 hours	Source-only with SN	67.22 / 56.50	-
PV-RCNN++	~20 hours	Source-only	67.68 / 20.82	-
PV-RCNN++	~2.2 hours	Post-SN	86.86 / 79.86	model-193M

ADA Results:

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
For Waymo dataset training, we train the model using 20% data.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.

	training time	Adaptation	Car@R40	download
PV-RCNN	~23h@4 A100	Source Only	67.95 / 27.65	-
PV-RCNN	~1.5h@2 A100	Bi3D (1% annotation budget)	87.12 / 78.03	Model-58M
PV-RCNN	~10h@2 A100	Bi3D (5% annotation budget)	89.53 / 81.32	Model-58M
PV-RCNN	~1.5h@2 A100	TQS	82.00 / 72.04	Model-58M
PV-RCNN	~1.5h@2 A100	CLUE	82.13 / 73.14	Model-50M
PV-RCNN	~10h@2 A100	Bi3D+ST3D	87.83 / 81.23	Model-58M
Voxel R-CNN	~16h@4 A100	Source Only	64.87 / 19.90	-
Voxel R-CNN	~1.5h@2 A100	Bi3D (1% annotation budget)	88.09 / 79.14	Model-72M
Voxel R-CNN	~6h@2 A100	Bi3D (5% annotation budget)	90.18 / 81.34	Model-72M
Voxel R-CNN	~1.5h@2 A100	TQS	78.26 / 67.11	Model-72M
Voxel R-CNN	~1.5h@2 A100	CLUE	81.93 / 70.89	Model-72M

SSDA Results:

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

For Waymo-to-nuScenes adaptation, we employ 4 NVIDIA A100 GPUs for model training.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
For Waymo dataset training, we train the model using 20% data.
second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.

	training time	Adaptation	Car@R40	download
Second	~11 hours	source-only(Waymo)	27.85 / 16.43	-
Second	~0.4 hours	second_5%_FT	45.95 / 26.98	model-61M
Second	~1.8 hours	second_5%_SESS	47.77 / 28.74	model-61M
Second	~1.7 hours	second_5%_PS	47.72 / 29.37	model-61M
PV-RCNN	~24 hours	source-only(Waymo)	40.31 / 23.32	-
PV-RCNN	~1.0 hours	pvrcnn_5%_FT	49.58 / 34.86	model-150M
PV-RCNN	~5.5 hours	pvrcnn_5%_SESS	49.92 / 35.28	model-150M
PV-RCNN	~5.4 hours	pvrcnn_5%_PS	49.84 / 35.07	model-150M
PV-RCNN++	~16 hours	source-only(Waymo)	31.96 / 19.81	-
PV-RCNN++	~1.2 hours	pvplus_5%_FT	49.94 / 34.28	model-185M
PV-RCNN++	~4.2 hours	pvplus_5%_SESS	51.14 / 35.25	model-185M
PV-RCNN++	~3.6 hours	pvplus_5%_PS	50.84 / 35.39	model-185M

For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
SESS denotes that we utilize the SESS method to adapt the baseline.
For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.

	Training ONCE Data	Methods	Vehicle@AP	Pedestrian@AP	Cyclist@AP	download
Centerpoint	Labeled (4K)	Train from scracth	74.93	46.21	67.36	model-96M
Centerpoint_Pede	Labeled (4K)	PS	-	49.14	-	model-96M
PV-RCNN++	Labeled (4K)	Train from scracth	79.78	35.91	63.18	model-188M
PV-RCNN++	Small Dataset (100K)	SESS	80.02	46.24	66.41	model-188M

MDF Results:

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
For Waymo dataset training, we train the model using 20% training data for saving training time.
PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
PV-RCNN-nuScenes	only nuScenes	35.59 / 35.21	3.95 / 2.55	0.94 / 0.92	57.78 / 41.10	24.52 / 18.56	10.24 / 8.25
PV-RCNN-Waymo	only Waymo	66.49 / 66.01	64.09 / 58.06	62.09 / 61.02	32.99 / 17.55	3.34 / 1.94	0.02 / 0.01
PV-RCNN-DM	Direct Merging	57.82 / 57.40	48.24 / 42.81	54.63 / 53.64	48.67 / 30.43	12.66 / 8.12	1.67 / 1.04
PV-RCNN-Uni3D	Uni3D	66.98 / 66.50	65.70 / 59.14	61.49 / 60.43	60.77 / 42.66	27.44 / 21.85	13.50 / 11.87
PV-RCNN-DT	Domain Attention	67.27 / 66.77	65.86 / 59.38	61.38 / 60.34	60.83 / 43.03	27.46 / 22.06	13.82 / 11.52

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
Voxel-RCNN-nuScenes	only nuScenes	31.89 / 31.65	3.74 / 2.57	2.41 / 2.37	53.63 / 39.05	22.48 / 17.85	10.86 / 9.70
Voxel-RCNN-Waymo	only Waymo	67.05 / 66.41	66.75 / 60.83	63.13 / 62.15	34.10 / 17.31	2.99 / 1.69	0.05 / 0.01
Voxel-RCNN-DM	Direct Merging	58.26 / 57.87	52.72 / 47.11	50.26 / 49.50	51.40 / 31.68	15.04 / 9.99	5.40 / 3.87
Voxel-RCNN-Uni3D	Uni3D	66.76 / 66.29	66.62 / 60.51	63.36 / 62.42	60.18 / 42.23	30.08 / 24.37	14.60 / 12.32
Voxel-RCNN-DT	Domain Attention	66.96 / 66.50	68.23 / 62.00	62.57 / 61.64	60.42 / 42.81	30.49 / 24.92	15.91 / 13.35

Baseline	MDF Methods	Waymo@Vehicle	Waymo@Pedestrian	Waymo@Cyclist	nuScenes@Car	nuScenes@Pedestrian	nuScenes@Cyclist
PV-RCNN++ DM	Direct Merging	63.79 / 63.38	55.03 / 49.75	59.88 / 58.99	50.91 / 31.46	17.07 / 12.15	3.10 / 2.20
PV-RCNN++-Uni3D	Uni3D	68.55 / 68.08	69.83 / 63.60	64.90 / 63.91	62.51 / 44.16	33.82 / 27.18	22.48 / 19.30
PV-RCNN++-DT	Domain Attention	68.51 / 68.05	69.81 / 63.58	64.39 / 63.43	62.33 / 44.16	33.44 / 26.94	21.64 / 18.52

Point Cloud Pre-training for Autonomous Driving Task

Based on our research progress on the cross-domain adaptation of multiple autonomous driving datasets, we can utilize the multi-source datasets for performing the pre-training task. Here, we present several unsupervised and self-supervised pre-training implementations (including PointContrast).

Please refer to Readme for point cloud pre-training for starting the journey of 3D perception model pre-training.
💪 💪 We are actively exploring the possibility of boosting the 3D pre-training generalization ability. The corresponding code and pre-training checkpoints are coming soon in 3DTrans-v0.2.0.

Visualization Tools for 3DTrans

Our 3DTrans supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.
Visualization Demo:

Acknowledge

Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.
Our pre-training 3D point cloud task is based on ONCE Dataset. Thanks ONCE Development Team for their inspiring data release.

Our Papers:

Our Team:

A Team Home for Member Information and Profile, Project Link

Related Projects

Welcome to see SUG for exploring the possibilities of adapting a model to multiple unseen domains and studying how to leverage the feature’s multi-modal information residing in a single dataset. You can also use SUG to obtain generalizable features by means of a single dataset, which can obtain good results on unseen domains or datasets.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, and Yu Qiao},
  booktitle={CVPR},
  year={2023},
}

@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, and Yu Qiao},
  booktitle={CVPR},
  year={2023},
}

@article{fei2023unida3d,
  title={UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline},
  author={Ben Fei, Siyuan Huang, Jiakang Yuan, Botian Shi, Bo Zhang, Weidong Yang, Min Dou, and Yikang Li},
  journal={arXiv preprint arXiv:2212.10390},
  year={2023}
}

Lycokie/3DTrans