/Rein

[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

[CVPR 2024] Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

PWC

PWC

PWC

Abstract

This project serves as the official implementation for the paper. It presents a robust fine-tuning method called Rein, specifically developed to effectively utilize Vision Foundation Models (VFMs) for Domain Generalized Semantic Segmentation (DGSS). It achieves SOTA on Cityscapes to ACDC, and GTAV to Cityscapes+Mapillary+BDD100K. Using only synthetic data, Rein achieved an mIoU of 78.4% on Cityscapes validation set! Using only the data from the Cityscapes training set, we achieved an average mIoU of 77.56% on ACDC test set!

Rein Framework

🔥 News!

  • We have uploaded the config for ResNet and ConvNeXt.

  • 🔥 We have uploaded the checkpoint and config for +1/16 of Cityscapes training set, and it get 82.5% on the Cityscapes validation set!

  • Rein is accepted in CVPR2024!

  • 🔥 Using only the data from the Cityscapes training set, we achieved an average mIoU of 77.56% on the ACDC test set! This result ranks first in the DGSS methods on the ACDC benchmark! Checkpoint is avaliable at release.

  • 🔥 Using only synthetic data (UrbanSyn, GTAV, and Synthia), Rein achieved an mIoU of 78.4% on Cityscapes! Checkpoint is avaliable at release.

Performance Under Various Settings (DINOv2).

Setting mIoU Config Log & Checkpoint
GTAV $\rightarrow$ Cityscapes 66.7 config log & checkpoint
+Synthia $\rightarrow$ Cityscapes 72.2 config log & checkpoint
+UrbanSyn $\rightarrow$ Cityscapes 78.4 config log & checkpoint
+1/16 of Cityscapes training $\rightarrow$ Cityscapes 82.5 config log & checkpoint
GTAV $\rightarrow$ BDD100K 60.0 config log & checkpoint
Cityscapes $\rightarrow$ ACDC 77.6 config log & checkpoint
Cityscapes $\rightarrow$ Cityscapes-C 60.0 config log & checkpoint

Performance For Various Backbones (Trained on GTAV).

Setting Pretraining Citys. mIoU Config Log & Checkpoint
ResNet50 ImageNet1k 49.1 config log & checkpoint
ResNet101 ImageNet1k 45.9 config log & checkpoint
ConvNeXt-Large ImageNet21k 57.9 config log & checkpoint
ViT-Small DINOv2 55.3 config log & checkpoint
ViT-Base DINOv2 64.3 config log & checkpoint

Try and Test

Experience the demo: Users can open demo.ipynb in any Jupyter-supported editor to explore our demonstration. Demo Preview

For testing on the cityscapes dataset, refer to the 'Install' and 'Setup' sections below.

Environment Setup

To set up your environment, execute the following commands:

conda create -n rein -y
conda activate rein
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia -y
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
pip install "mmsegmentation>=1.0.0"
pip install "mmdet>=3.0.0"
pip install xformers=='0.0.20' # optional for DINOv2
pip install -r requirements.txt
pip install future tensorboard

Dataset Preparation

The Preparation is similar as DDB.

Cityscapes: Download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from Cityscapes Dataset and extract them to data/cityscapes.

Mapillary: Download MAPILLARY v1.2 from Mapillary Research and extract it to data/mapillary.

GTA: Download all image and label packages from TU Darmstadt and extract them to data/gta.

Prepare datasets with these commands:

cd Rein
mkdir data
# Convert data for validation if preparing for the first time
python tools/convert_datasets/gta.py data/gta # Source domain
python tools/convert_datasets/cityscapes.py data/cityscapes
# Convert Mapillary to Cityscapes format and resize for validation
python tools/convert_datasets/mapillary2cityscape.py data/mapillary data/mapillary/cityscapes_trainIdLabel --train_id
python tools/convert_datasets/mapillary_resize.py data/mapillary/validation/images data/mapillary/cityscapes_trainIdLabel/val/label data/mapillary/half/val_img data/mapillary/half/val_label

(Optional) ACDC: Download all image and label packages from ACDC and extract them to data/acdc.

(Optional) UrbanSyn: Download all image and label packages from UrbanSyn and extract them to data/urbansyn.

The final folder structure should look like this:

Rein
├── ...
├── checkpoints
│   ├── dinov2_vitl14_pretrain.pth
│   ├── dinov2_rein_and_head.pth
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── bdd100k
│   │   ├── images
│   │   |   ├── 10k
│   │   │   |    ├── train
│   │   │   |    ├── val
│   │   ├── labels
│   │   |   ├── sem_seg
│   │   |   |    ├── masks
│   │   │   |    |    ├── train
│   │   │   |    |    ├── val
│   ├── mapillary
│   │   ├── training
│   │   ├── cityscapes_trainIdLabel
│   │   ├── half
│   │   │   ├── val_img
│   │   │   ├── val_label
│   ├── gta
│   │   ├── images
│   │   ├── labels
├── ...

Pretraining Weights

  • Download: Download pre-trained weights from facebookresearch for testing. Place them in the project directory without changing the file name.
  • Convert: Convert pre-trained weights for training or evaluation.
    python tools/convert_models/convert_dinov2.py checkpoints/dinov2_vitl14_pretrain.pth checkpoints/dinov2_converted.pth
    (optional for 1024x1024 resolution)
    python tools/convert_models/convert_dinov2.py checkpoints/dinov2_vitl14_pretrain.pth checkpoints/dinov2_converted_1024x1024.pth --height 1024 --width 1024

Evaluation

Run the evaluation:

python tools/test.py configs/dinov2/rein_dinov2_mask2former_512x512_bs1x4.py checkpoints/dinov2_rein_and_head.pth --backbone dinov2_converted.pth

For most of provided release checkpoints, you can run this command to evluate

python tools/test.py /path/to/cfg /path/to/checkpoint --backbone /path/to/dinov2_converted.pth #(or dinov2_converted_1024x1024.pth)

Training

Start training in single GPU:

python tools/train.py configs/dinov2/rein_dinov2_mask2former_512x512_bs1x4.py

Start training in multiple GPU:

PORT=12345 CUDA_VISIBLE_DEVICES=1,2,3,4 bash tools/dist_train.sh configs/dinov2/rein_dinov2_mask2former_1024x1024_bs4x2.py NUM_GPUS

FAQs

Citation

If you find our code or data helpful, please cite our paper:

@article{wei2023stronger,
  title={Stronger, Fewer, \& Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation},
  author={Wei, Zhixiang and Chen, Lin and Jin, Yi and Ma, Xiaoxiao and Liu, Tianle and Ling, Pengyang and Wang, Ben and Chen, Huaian and Zheng, Jinjin},
  journal={arXiv preprint arXiv:2312.04265},
  year={2023}
}

Acknowledgment

Our implementation is mainly based on following repositories. Thanks for their authors.