Delivering Arbitrary-Modal Semantic Segmentation (CVPR 2023)



To conduct arbitrary-modal semantic segmentation, we create DeLiVER benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. It has four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. Besides, we present the arbitrary cross-modal segmentation model CMNeXt, allowing to scale from 1 to 81 modalities on the DeLiVER, KITTI-360, MFNet, NYU Depth V2, UrbanLF, and MCubeS datasets.

For more details, please check our arXiv paper.


  • 03/2023, init repository.
  • 04/2023, release front-view DeLiVER. Download from GoogleDrive.
  • 04/2023, release CMNeXt model weights. Download from GoogleDrive.

DeLiVER dataset



DeLiVER multimodal dataset including (a) four adverse conditions out of five conditions(cloudy, foggy, night-time, rainy and sunny). Apart from normal cases, each condition has five corner cases (MB: Motion Blur; OE: Over-Exposure; UE: Under-Exposure; LJ: LiDAR-Jitter; and EL: Event Low-resolution). Each sample has six views. Each view has four modalities and two labels (semantic and instance). (b) is the data statistics. (c) is the data distribution of 25 semantic classes.

DELIVER splitting


Data folder structure

Download DELIVER dataset from GoogleDrive (~12.2 GB).

The data/DELIVER folder is structured as:

├── depth
│   ├── cloud
│   │   ├── test
│   │   │   ├── MAP_10_point102
│   │   │   │   ├── 045050_depth_front.png
│   │   │   │   ├── ...
│   │   ├── train
│   │   └── val
│   ├── fog
│   ├── night
│   ├── rain
│   └── sun
├── event
├── hha
├── img
├── lidar
└── semantic

CMNeXt model


CMNeXt architecture in Hub2Fuse paradigm and asymmetric branches, having e.g., Multi-Head Self-Attention (MHSA) blocks in the RGB branch and our Parallel Pooling Mixer (PPX) blocks in the accompanying branch. At the hub step, the Self-Query Hub selects informative features from the supplementary modalities. At the fusion step, the feature rectification module (FRM) and feature fusion module (FFM) are used for feature fusion. Between stages, features of each modality are restored via adding the fused feature. The four-stage fused features are forwarded to the segmentation head for the final prediction.


conda env create -f environment.yml
conda activate cmnext
# Optional: install apex follow: https://github.com/NVIDIA/apex

Data preparation

Prepare six datasets:

  • DELIVER, for RGB-Depth-Event-LiDAR semantic segmentation.
  • KITTI-360, for RGB-Depth-Event-LiDAR semantic segmentation.
  • NYU Depth V2, for RGB-Depth semantic segmentation.
  • MFNet, for RGB-Thermal semantic segmentation.
  • UrbanLF, for light-filed segmentation based on sub-aperture images.
  • MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.

Then, all datasets are structured as:

│   ├── img
│   ├── hha
│   ├── event
│   ├── lidar
│   └── semantic
├── KITTI-360
│   ├── data_2d_raw
│   ├── data_2d_hha
│   ├── data_2d_event
│   ├── data_2d_lidar
│   └── data_2d_semantics
├── NYUDepthv2
│   ├── RGB
│   ├── HHA
│   └── Label
├── MFNet
│   ├── rgb
│   ├── ther
│   └── labels
├── UrbanLF
│   ├── Syn
│   └── real
├── MCubeS
│   ├── polL_color
│   ├── polL_aolp
│   ├── polL_dolp
│   ├── NIR_warped
│   └── SS

For RGB-Depth, the HHA format is generated from depth image.

Model Zoo

DELIVER dataset

Model-Modal #Params(M) GFLOPs mIoU weight
CMNeXt-RGB 25.79 38.93 57.20 GoogleDrive
CMNeXt-RGB-E 58.69 62.94 57.48 GoogleDrive
CMNeXt-RGB-L 58.69 62.94 58.04 GoogleDrive
CMNeXt-RGB-D 58.69 62.94 63.58 GoogleDrive
CMNeXt-RGB-D-E 58.72 64.19 64.44 GoogleDrive
CMNeXt-RGB-D-L 58.72 64.19 65.50 GoogleDrive
CMNeXt-RGB-D-E-L 58.73 65.42 66.30 GoogleDrive

KITTI360 dataset

Model-Modal mIoU weight
CMNeXt-RGB 67.04 GoogleDrive
CMNeXt-RGB-E 66.13 GoogleDrive
CMNeXt-RGB-L 65.26 GoogleDrive
CMNeXt-RGB-D 65.09 GoogleDrive
CMNeXt-RGB-D-E 67.73 GoogleDrive
CMNeXt-RGB-D-L 66.55 GoogleDrive
CMNeXt-RGB-D-E-L 67.84 GoogleDrive

NYU Depth V2

Model-Modal mIoU weight
CMNeXt-RGB-D (MiT-B4) 56.9 GoogleDrive


Model-Modal mIoU weight
CMNeXt-RGB-D (MiT-B4) 59.9 GoogleDrive


There are real and synthetic datasets.

Model-Modal Real weight Syn weight
CMNeXt-RGB 82.20 GoogleDrive 78.53 GoogleDrive
CMNeXt-RGB-LF8 83.22 GoogleDrive 80.74 GoogleDrive
CMNeXt-RGB-LF33 82.62 GoogleDrive 80.98 GoogleDrive
CMNeXt-RGB-LF80 83.11 GoogleDrive 81.02 GoogleDrive


Model-Modal mIoU weight
CMNeXt-RGB 48.16 GoogleDrive
CMNeXt-RGB-A 48.42 GoogleDrive
CMNeXt-RGB-A-D 49.48 GoogleDrive
CMNeXt-RGB-A-D-N 51.54 GoogleDrive


Before training, please download pre-trained SegFormer, such as checkpoints/pretrained/segformer/mit_b2.pth.

├── mit_b2.pth
└── mit_b4.pth

To train CMNeXt model, please use change yaml file for --cfg. Several training examples using 4 A100 GPUs are:

cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/deliver_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/kitti360_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/nyu_rgbd.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mfnet_rgbt.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mcubes_rgbadn.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/urbanlf.yaml


To evaluate CMNeXt models, please download respective model weights (GoogleDrive) as:

│   ├── cmnext_b2_deliver_rgb.pth
│   ├── cmnext_b2_deliver_rgbd.pth
│   ├── cmnext_b2_deliver_rgbde.pth
│   ├── cmnext_b2_deliver_rgbdel.pth
│   ├── cmnext_b2_deliver_rgbdl.pth
│   ├── cmnext_b2_deliver_rgbe.pth
│   └── cmnext_b2_deliver_rgbl.pth
├── KITTI360
│   ├── cmnext_b2_kitti360_rgb.pth
│   ├── cmnext_b2_kitti360_rgbd.pth
│   ├── cmnext_b2_kitti360_rgbde.pth
│   ├── cmnext_b2_kitti360_rgbdel.pth
│   ├── cmnext_b2_kitti360_rgbdl.pth
│   ├── cmnext_b2_kitti360_rgbe.pth
│   └── cmnext_b2_kitti360_rgbl.pth
├── MCubeS
│   ├── cmnext_b2_mcubes_rgb.pth
│   ├── cmnext_b2_mcubes_rgba.pth
│   ├── cmnext_b2_mcubes_rgbad.pth
│   └── cmnext_b2_mcubes_rgbadn.pth
├── MFNet
│   └── cmnext_b4_mfnet_rgbt.pth
├── NYU_Depth_V2
│   └── cmnext_b4_nyu_rgbd.pth
├── UrbanLF
│   ├── cmnext_b4_urbanlf_real_rgblf1.pth
│   ├── cmnext_b4_urbanlf_real_rgblf33.pth
│   ├── cmnext_b4_urbanlf_real_rgblf8.pth
│   ├── cmnext_b4_urbanlf_real_rgblf80.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf1.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf33.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf8.pth
│   └── cmnext_b4_urbanlf_syn_rgblf80.pth

Then, modify --cfg to respective config file, and run:

cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
CUDA_VISIBLE_DEVICES=0 python tools/val_mm.py --cfg configs/deliver_rgbdel.yaml

On DeLiVER dataset, there are validation and test sets. Please check val_mm.py to modify the dataset for validation and test sets.

To evaluate the different cases (adverse weather conditions, sensor failures), modify the cases list at val_mm.py, as shown below:

# cases = ['cloud', 'fog', 'night', 'rain', 'sun']
# cases = ['motionblur', 'overexposure', 'underexposure', 'lidarjitter', 'eventlowres']
cases = [None] # all

Note that the default value is [None] for all cases.

DELIVER visualization

The visualization results on DELIVER dataset. From left to right are the respective cloudy, foggy, night and rainy scene.


This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.


If you use DeLiVer dataset and CMNeXt model, please cite the following works:

