(ECCV 2024) ProLab: Property-level Label Space

A Semantic Space is Worth 256 Language Descriptions:
Make Stronger Segmentation Models with Descriptive Properties

Junfei Xiao¹, Ziqi Zhou², Wenxuan Li¹, Shiyi Lan³, Jieru Mei¹, Zhiding Yu³,
Bingchen Zhao⁴, Alan Yuille¹, Yuyin Zhou², Cihang Xie²

¹Johns Hopkins University, ²UCSC, ³NVIDIA, ⁴University of Edinburgh

Paper | Property-level Label Space | Model Zoo | Training & Evaluation

News

[07/07/24] 🔥 ProLab: Property-level Label Space is accepted to ECCV 2024. A camera-ready version is coming in the next 1~2 weeks paper. Stay tuned.
[12/21/23] 🔥 ProLab: Property-level Label Space is released. We propose to retrieve descriptive properties grounded in common sense knowledge to build a property-level label space which makes strong interpretable segmentation models. Please checkout the paper.

Method

Emerged Generalization Ability

ProLab models have emerged generalization ability to out-of-domain categories and even unknown categories.

Getting Started
Data Preparation
Property-level Label Space
Model Zoo
Training & Evaluation

Getting Started

Our segmentation code is developed on top of MMSegmentation and ViT-Adapter.

Setup

We have two tested environments based on torch 1.9+cuda 11.1+MMSegmentation v0.20.2 and torch 1.13.1+torch11.7+MMSegmentation v0.27.0.

Environment 1 (torch 1.9+cuda 11.1+MMSegmentation v0.20.2)

conda create -n prolab python=3.8
conda activate prolab
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0 # for Mask2Former
pip install mmsegmentation==0.20.2
pip install -r requirements.txt
cd ops & sh make.sh # compile deformable attention

Environment 2 (torch 1.13.1+cuda 11.7+MMSegmentation v0.27.0)

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0 # may need modification on the limitation of mmcv version 
pip install mmsegmentation==0.27.0
pip install -r requirements.txt
cd ops & sh make.sh # compile deformable attention

Data Preparation

ADE20K/Cityscapes/COCO Stuff/Pascal Context

Please follow the guidelines in MMSegmentation to download ADE20K, Cityscapes, COCO Stuff and Pascal Context.

BDD

Please visit the official website to download the BDD dataset.

Property-level Label Space

Descriptive Properties and Clustered Embeddings (Ready-to-use)

We provide the retrieved descriptive properties (with GPT-3.5) and property-level labels (language embeddings) .

Descriptive Properties Retrieval (Optional)

We provide generate_descrtiptions.ipynb using GPT 3.5 (API) and LLAMA-2 (local deploy) to retrieve descriptive properties.

Encode Descriptions into Embeddings (Optional)

We also provide generate_embeddings.ipynb to encode and cluster the descriptive properties into embeddings with Sentence Transformer (huggingface, paper) and BAAI-BGE models (huggingface, paper) step-by-step.

Model Zoo

ADE20K

Framework	Backbone	Pretrain	Lr schd	Crop Size	mIoU	Config	Checkpoint
UperNet	ViT-Adapter-B	DeiT-B	320k	512	49.0	config	Google Drive
UperNet	ViT-Adapter-L	BEiT-L	160k	640	58.2	config	Google Drive
UperNet	ViT-Adapter-L	BEiTv2-L	80K	896	58.7	config	Google Drive

COCO-Stuff-164K

Framework	Backbone	Pretrain	Lr schd	Crop Size	mIoU	Config	Checkpoint
UperNet	ViT-Adapter-B	DeiT-B	160K	512	45.4	config	Google Drive

Pascal Context

Framework	Backbone	Pretrain	Lr schd	Crop Size	mIoU	Config	Checkpoint
UperNet	ViT-Adapter-B	DeiT-B	160K	512	58.2	config	Google Drive

Cityscapes

Framework	Backbone	Pretrain	Lr schd	Crop Size	mIoU	Config	Checkpoint
UperNet	ViT-Adapter-B	DeiT-B	160K	768	81.4	config	Google Drive

BDD

Framework	Backbone	Pretrain	Lr schd	Crop Size	mIoU	Config	Checkpoint
UperNet	ViT-Adapter-B	DeiT-B	160K	768	65.7	config	Google Drive

Training & Evaluation

Training

The following example script is to train ViT-Adapter-B + UperNet on ADE20k on a single node with 8 gpus:

sh dist_train.sh configs/ADE20K/upernet_deit_adapter_base_512_320k_ade20k_bge_base.py 8

Evaluation

The following example script is to evaluate ViT-Adapter-B + UperNet on COCO_Stuff val on a single node with 8 gpus:

sh dist_test.sh configs/COCO_Stuff/upernet_deit_adapter_base_512_160k_coco_stuff_bge_base.py 8 --eval mIoU

Citation

If this paper is useful to your work, please cite:

@article{xiao2023semantic,
  author    = {Xiao, Junfei and Zhou, Ziqi and Li, Wenxuan and Lan, Shiyi and Mei, Jieru and Yu, Zhiding and Yuille, Alan and Zhou, Yuyin and Xie, Cihang},
  title     = {A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties},
  journal   = {arXiv preprint arXiv:2312.13764},
  year      = {2023},
}

Acknowledgement

GPT-3.5 and Llama-2 are used for retrieving descriptive properties.

Sentence Transformer and BAAI-BGE are used as description embedding models.

MMSegmentation and ViT-Adapter are used as the segmentation codebase.

Many thanks to all these great projects .

lambert-x/ProLab