Multimodal Range View Based Semantic Segmentation

Code for our project "Multimodal Range View Based Semantic Segmentation" of the course "Deep Learning for 3D Perception" at the Technical University of Munich under supervision of Prof. Angela Dai.

Prepare:

Download SemanticKITTI from their official website.

Usage：

Train：

Lidar backbone with Range Augmentations (RA):

512 x 64 range-view (RV) resolution:

python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/cenet_512.yml \
    -n cenet_512_RA

1024 x 64 RV resolution (retrain from 512 x 64 checkpoint as the authors of CENet recommend):

python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/cenet_1024.yml \
    -p /path/to/cenet_512_RA -n cenet_1024_RA

RGB backbone fine-tuning on SemanticKITTI dataset with range-view labels:

for usage with 512 x 64 model:

python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/mask2former_512.yml \
    -n mask2former_512

for usage with 1024 x 64 model:

python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/mask2former_1024.yml \
    -n mask2former_1024

Fusion Model:

512 x 64 range-view (RV) resolution:

python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/fusion_512.yml \
    -n fusion_512

1024 x 64 RV resolution:

python train.py -d /path/to/SemanticKITTI/dataset -ac config/arch/fusion_1024.yml \
    -n fusion_1024

Infer and Evaluation：

Infer:

python infer.py -d /path/to/SemanticKITTI/dataset -l /path/to/save/predictions/in \
    -m path/to/trained_model

Evalulation:

Lidar and fusion models:

python evaluate_iou.py -d /path/to/SemanticKITTI/dataset -p /path/to/predictions

RGB models:

python evaluate_iou_rgb.py -d /path/to/SemanticKITTI/dataset -p /path/to/predictions

Visualize Example:

Visualize GT:

python visualize.py -w kitti -d /path/to/SemanticKITTI/dataset -s which_sequences

Visualize Predictions:

python visualize.py -w kitti -d /path/to/SemanticKITTI/dataset -p /path/to/predictions \
    -s which_sequences

Pretrained Models and Logs:

Our pre-trained models can be found here.

Acknowledgments：

Our codebase originates from CENet. For the fusion model we use code from SwinFusion, while we follow the Hugging Face implementation of Mask2Former as RGB backbone. For initialization, we utilize the pre-trained Mask2Former models trained on the Cityscapes dataset for semantic segmentation.

nschi00/rangeview-rgb-lidar-fusion