[Website (soon)] [arXiv (soon)] [GitHub]
This directory contains the code for the MESS evaluation of OVSeg. Please see the commits for our changes of the model.
Create a conda environment ovseg
and install the required packages. See mess/README.md for details.
bash mess/setup_env.sh
Prepare the datasets by following the instructions in mess/DATASETS.md. The ovseg
env can be used for the dataset preparation. If you evaluate multiple models with MESS, you can change the dataset_dir
argument and the DETECTRON2_DATASETS
environment variable to a common directory (see mess/DATASETS.md and mess/eval.sh, e.g., ../mess_datasets
).
Download the OVSeg weights (see https://github.com/facebookresearch/ov-seg/blob/main/GETTING_STARTED.md)
mkdir weights
conda activate ovseg
# Python code for downloading the weights from GDrive. Link: https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view
python -c "import gdown; gdown.download(f'https://drive.google.com/uc?export=download&confirm=pbef&id=1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy', output='weights/ovseg_swinbase_vitL14_ft_mpt.pth')"
To evaluate the OVSeg model on the MESS dataset, run
bash mess/eval.sh
# for evaluation in the background:
nohup bash mess/eval.sh > eval.log &
tail -f eval.log
For evaluating a single dataset, select the DATASET from mess/DATASETS.md, the DETECTRON2_DATASETS path, and run
conda activate ovseg
export DETECTRON2_DATASETS="datasets"
DATASET=<dataset_name>
# OVSeg large model
python train_net.py --num-gpus 1 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS weights/ovseg_swinbase_vitL14_ft_mpt.pth OUTPUT_DIR output/OVSeg/$DATASET DATASETS.TEST \(\"$DATASET\",\)
This is the official PyTorch implementation of our paper:
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu
Computer Vision and Pattern Recognition Conference (CVPR), 2023
[arXiv] [Project] [huggingface demo]
Please see installation guide.
Please see datasets preparation.
Please see getting started instruction.
Please see open clip training.
The majority of OVSeg is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo.
If you use OVSeg in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.
@inproceedings{liang2023open,
title={Open-vocabulary semantic segmentation with mask-adapted clip},
author={Liang, Feng and Wu, Bichen and Dai, Xiaoliang and Li, Kunpeng and Zhao, Yinan and Zhang, Hang and Zhang, Peizhao and Vajda, Peter and Marculescu, Diana},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7061--7070},
year={2023}
}