[ICME'23] Weakly Supervised Few-Shot and Zero-Shot Semantic Segmentation with Mean Instance Aware Prompt Learning

Primary LanguagePythonMIT LicenseMIT


This is the official code for "Weakly Supervised Few-Shot and Zero-Shot Semantic Segmentation with Mean Instance Aware Prompt Learning", IEEE International Conference on Multimedia and Expo (ICME) 2023 [Oral].

Installation and setup

Installing dependencies

To install required packages for running this code, follow the instructions below

git clone https://github.com/mustafa1728/MIAPNet.git
cd MIAPNet
conda create --name MIAPNet # (optional, for making a conda environment)
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
cd third_party/CLIP
python -m pip install -Ue .

Please chose the pytorch and cudatoolkit versions according to your CUDA environment. Refer pytorch installation instructions here. For more details on installing detectron2 and selecting necessary options based on your CUDA environment, please refer detectron2 installation instructions here.

Data Preparation

We experiment with PASCAL VOC 2012 [page] and MS COCO 2014 [page] datasets. Download the images and corresponding annotations from here and here for pascal and coco respectively. The code expects the following directory structure:

   - datasets
      - VOC2012
         - JPEGImages
      - coco
         - JPEGImages

You may need to use a different directory structure, in which case, change necessary paths in register_pascal and register_coco.

Download our generated pixel pseudo labels for different fold of both PASCAL and COCO from here. Alternatively, you may generate the pseudo labels youself by following the training and psuedo label generation procedure outlined in L2G.

After downloading the required datasets, run preprocess files to generate validation labels for different splits in ZSS and the 1-way and 2-way settings in FSS.


To train your models, make necessary changes in the configs of your choice. The configs of ZSS and FSS are in voc for PASCAL VOC and in coco for COCO. Additional ablation experiments can be run using config files in ablations.

A generic command to run a paritcular experiment corresponding to a config is:

python3 train_net.py --config-file <config-path>

For example, to run WZSS on fold 3 of PASCAL VOC, run the following command:

python3 train_net.py --config-file ./configs/voc/wzss/fold0.yaml

You can provide command line arguments to modify certain config entries like this:

python3 train_net.py --config-file <config-path> --num-gpus 6 OUTPUT_DIR <output-path> MODEL.WEIGHTS <weight-init-path>

ZSS training

To train a model on a particular setting, you need to first run the learning prompts experiment and then run the zss experiment using the context vectors and weights learnt in the first part. For example, for PASCAL VOC fold 0, the commands are:

  1. Run Prompt Learning
    python3 train_net.py --config-file ./configs/voc/prompt_learn/fold0.yaml OUTPUT_DIR <output-path> 
  2. Add the path to the learnt weights in WZSS config here. Change the following entries accordingly:
            PROMPT_CHECKPOINT: <output-path>/model_final.pth
                PROMPT_CHECKPOINT: <output-path>/model_final.pth
  3. Run the WZSS experiment
    python3 train_net.py --config-file ./configs/voc/wzss/fold0.yaml


By default, a model being trained for wzss gets evaluated at periodic intervals. To evaluate a trained model for wzss separately, simply run:

python3 train_net.py --config-file <config-path> --eval-only --resume

You may set the path to a trained model weight in command line like this:

python3 train_net.py --config-file <config-path> --eval-only MODEL.WEIGHTS <weight-path>

WFSS evaluation

To evaluate WFSS, a model trained in the WZSS setting can be used directly. The command to run WFSS on PASCAL VOC fold 0 is:

python3 train_net.py --config-file ./configs/voc/wfss/fold0.yaml --eval-only MODEL.WEIGHTS <weight-path>

where <weight-path> points to the path of the save model during WZSS training of fold 0 PASCAL VOC. Some care needs to be taken to make sure that fold 0 WZSS models are testing for fold 0 WFSS only.

Similarly, experiments for the 2-way setting can be run with the same trained model:

python3 train_net.py --config-file ./configs/voc/wfss_2way/fold0.yaml --eval-only MODEL.WEIGHTS <weight-path>



During evaluation, set the DATASETS.VIS_MULTIPLIER entry to 1 in the corresponding config. Segmentation maps for seen and unseen categories will be generated in OUTPUT_DIR/evaluation/pred_vis directory. Run visualize.py after changing the dir_name and vis_dir_name accordingly.


During evaluation, set the DATASETS.VIS_MULTIPLIER entry to 255 for 1-way and 127 for 2-way in the corresponding configs. Segmentation maps for unseen categories will be generated in OUTPUT_DIR/evaluation/pred_vis directory. For 1-way, these will be binary maps and for 2-way these will have 0, 127 and 254 as the three distinct values.


We thank the authors of Maskformer, CLIP and Simple Baseline for their awesome works. This repo benefits greatly from them.