WeCLIP: A Python repository from zbf1991

Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation (CVPR 2024 Highlight)

Code of CVPR 2024 paper: Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation.

This Project heavily relies on the [AFA] and [CLIP-ES]. Many thanks for their great work!

Preparations

VOC dataset

1. Download

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tar

2. Download the augmented annotations

The augmented annotations are from SBD dataset. Here is a download link of the augmented annotations at DropBox. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012. The directory sctructure should thus be

VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

COCO dataset

1. Download

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

After unzipping the downloaded files, for convenience, I recommand to organizing them in VOC style.

MSCOCO/
├── JPEGImages
│    ├── train
│    └── val
└── SegmentationClass
     ├── train
     └── val

2. Generating VOC style segmentation labels for COCO

To generate VOC style segmentation labels for COCO dataset, you could use the scripts provided at this repo. Or, just downloading the generated masks from Google Drive.

Create and activate conda environment

conda create --name py38 python=3.8
conda activate py38
pip install -r requirments.txt

Download Pre-trained CLIP-VIT/16 Weights

Download the pre-trained CLIP-VIT/16 weights from the official link.

Then, move this model to pretrained/.

Modify the config

Three parameters requires to be modified based on your path:

(1) root_dir: your/path/VOCdevkit/VOC2012 or your/path/MSCOCO

(2) name_list_dir: your/path/WeCLIP/datasets/voc or your/path/WeCLIP/datasets/coco

(3) clip_pretrain_path: your/path/WeCLIP/pretrained/ViT-B-16.pt

For VOC, Modify them in configs/voc_attn_reg.yaml.

For COCO, Modify them in configs/coco_attn_reg.yaml.

Train

To start training, just run the following code.

# train on voc
python scripts/dist_clip_voc.py --config your/path/WeCLIP/configs/voc_attn_reg.yaml
# train on coco
python scripts/dist_clip_coco.py --config your/path/WeCLIP/configs/coco_attn_reg.yaml

Inference

To inference, first modify the inference model path --model_path in test_msc_flip_voc or test_msc_flip_voc

Then, run the following code:

# inference on voc
python test_msc_flip_voc.py --model_path your/inference/model/path/WeCLIP_model_iter_30000.pth
# inference on coco
python test_msc_flip_coco.py --model_path your/inference/model/path/WeCLIP_model_iter_80000.pth

Citation

Please kindly cite our paper if you find it's helpful in your work.

@InProceedings{Zhang_2024_CVPR,
    author    = {Zhang, Bingfeng and Yu, Siyue and Wei, Yunchao and Zhao, Yao and Xiao, Jimin},
    title     = {Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {3796-3806}
}

Ackonwledge

Many thanks for AFA: [paper] [Project]

@inproceedings{ru2022learning,
    title = {Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers},
    author = {Lixiang Ru and Yibing Zhan and Baosheng Yu and Bo Du}
    booktitle = {CVPR},
    year = {2022},
  }

Many thanks for CLIP-ES: [paper] [Project]

@InProceedings{Lin_2023_CVPR,
    author    = {Lin, Yuqi and Chen, Minghao and Wang, Wenxiao and Wu, Boxi and Li, Ke and Lin, Binbin and Liu, Haifeng and He, Xiaofei},
    title     = {CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {15305-15314}
}

zbf1991/WeCLIP

Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation (CVPR 2024 Highlight)

Preparations

VOC dataset

1. Download

2. Download the augmented annotations

COCO dataset

1. Download

2. Generating VOC style segmentation labels for COCO

Create and activate conda environment

Download Pre-trained CLIP-VIT/16 Weights

Modify the config

Train

Inference

Citation

Ackonwledge