[MM'23] QA-CLIMS

This is the official PyTorch implementation of our paper:

QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]

Environment

Python 3.7
PyTorch 1.7.1
torchvision 0.8.2

pip install -r requirements.txt

PASCAL VOC2012

You can find the following files at here.

File	filename
FG & BG VQA results	`voc_vqa_fg_blip.npy` `voc_vqa_bg_blip.npy`
FG & BG VQA text features	`voc_vqa_fg_blip_ViT-L-14_cache.npy` `voc_vqa_bg_blip_ViT-L-14_cache.npy`
pre-trained baseline model	`res50_cam.pth`
QA-CLIMS model	`res50_qa_clims.pth`

1. Prepare VQA result features

You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy and voc_vqa_bg_blip_ViT-L-14_cache.npy above and put its in vqa/.

Or, you can generate it by yourself:

To generate VQA results, please follow third_party/README.

After that, run following command to generate VQA text features:

python gen_text_feats_cache.py voc \
    --vqa_fg_file vqa/voc_vqa_fg_blip.npy \
    --vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \
    --vqa_bg_file vqa/voc_vqa_bg_blip.npy \
    --vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \
    --clip ViT-L/14

2. Train QA-CLIMS and generate initial CAMs

Please download the pre-trained baseline model res50_cam.pth above and put it at cam-baseline-voc12/res50_cam.pth.

bash run_voc12_qa_clims.sh

3. Train IRNet and generate pseudo semantic masks

bash run_voc12_sem_seg.sh

4.Train DeepLab using pseudo semantic masks.

Please follow deeplab-pytorch or CLIMS.

MS COCO2014

You can find the following files at here.

File	filename
FG & BG VQA results	`coco_vqa_fg_blip.npy` `coco_vqa_bg_blip.npy`
FG & BG VQA text features	`coco_vqa_fg_blip_ViT-L-14_cache.npy` `coco_vqa_bg_blip_ViT-L-14_cache.npy`
pre-trained baseline model	`res50_cam.pth`
QA-CLIMS model	`res50_qa_clims.pth`

Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy and coco_vqa_bg_blip_ViT-L-14_cache.npy in vqa/, and res50_cam.pth in cam-baseline-coco14/.

Then, running the following command:

bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.sh

Citation

If you find this code useful for your research, please consider cite our paper:

@inproceedings{deng2023qa-clims,
  title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
  author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={5572--5583},
  year={2023}
}

This repository was highly based on CLIMS and IRNet, thanks for their great works!

CVI-SZU/QA-CLIMS