Multi-domain Evaluation of Semantic Segmentation (MESS) with OVSeg

[Website (soon)] [arXiv (soon)] [GitHub]

This directory contains the code for the MESS evaluation of OVSeg. Please see the commits for our changes of the model.

Setup

Create a conda environment ovseg and install the required packages. See mess/README.md for details.

 bash mess/setup_env.sh

Prepare the datasets by following the instructions in mess/DATASETS.md. The ovseg env can be used for the dataset preparation. If you evaluate multiple models with MESS, you can change the dataset_dir argument and the DETECTRON2_DATASETS environment variable to a common directory (see mess/DATASETS.md and mess/eval.sh, e.g., ../mess_datasets).

Download the OVSeg weights (see https://github.com/facebookresearch/ov-seg/blob/main/GETTING_STARTED.md)

mkdir weights
conda activate ovseg
# Python code for downloading the weights from GDrive. Link: https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view
python -c "import gdown; gdown.download(f'https://drive.google.com/uc?export=download&confirm=pbef&id=1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy', output='weights/ovseg_swinbase_vitL14_ft_mpt.pth')"

Evaluation

To evaluate the OVSeg model on the MESS dataset, run

bash mess/eval.sh

# for evaluation in the background:
nohup bash mess/eval.sh > eval.log &
tail -f eval.log

For evaluating a single dataset, select the DATASET from mess/DATASETS.md, the DETECTRON2_DATASETS path, and run

conda activate ovseg
export DETECTRON2_DATASETS="datasets"
DATASET=<dataset_name>

# OVSeg large model
python train_net.py --num-gpus 1 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS weights/ovseg_swinbase_vitL14_ft_mpt.pth OUTPUT_DIR output/OVSeg/$DATASET DATASETS.TEST \(\"$DATASET\",\)

--- Original OVSeg README.md ---

[OVSeg] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

This is the official PyTorch implementation of our paper:
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu
Computer Vision and Pattern Recognition Conference (CVPR), 2023

[arXiv] [Project] [huggingface demo]

Installation

Please see installation guide.

Data Preparation

Please see datasets preparation.

Getting started

Please see getting started instruction.

Finetuning CLIP

Please see open clip training.

LICENSE

Shield:

The majority of OVSeg is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo.

Citing OVSeg 🙏

If you use OVSeg in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.

@inproceedings{liang2023open,
  title={Open-vocabulary semantic segmentation with mask-adapted clip},
  author={Liang, Feng and Wu, Bichen and Dai, Xiaoliang and Li, Kunpeng and Zhao, Yinan and Zhang, Hang and Zhang, Peizhao and Vajda, Peter and Marculescu, Diana},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7061--7070},
  year={2023}
}

blumenstiel/ov-seg-MESS