This is the official code for "Weakly Supervised Few-Shot and Zero-Shot Semantic Segmentation with Mean Instance Aware Prompt Learning", IEEE International Conference on Multimedia and Expo (ICME) 2023 [Oral].
To install required packages for running this code, follow the instructions below
git clone https://github.com/mustafa1728/MIAPNet.git
cd MIAPNet
conda create --name MIAPNet # (optional, for making a conda environment)
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
cd third_party/CLIP
python -m pip install -Ue .
Please chose the pytorch and cudatoolkit versions according to your CUDA environment. Refer pytorch installation instructions here. For more details on installing detectron2 and selecting necessary options based on your CUDA environment, please refer detectron2 installation instructions here.
We experiment with PASCAL VOC 2012 [page] and MS COCO 2014 [page] datasets. Download the images and corresponding annotations from here and here for pascal and coco respectively. The code expects the following directory structure:
- MIAPNet
- datasets
- VOC2012
- JPEGImages
- coco
- JPEGImages
You may need to use a different directory structure, in which case, change necessary paths in register_pascal
and register_coco
.
Download our generated pixel pseudo labels for different fold of both PASCAL and COCO from here. Alternatively, you may generate the pseudo labels youself by following the training and psuedo label generation procedure outlined in L2G.
After downloading the required datasets, run preprocess
files to generate validation labels for different splits in ZSS and the 1-way and 2-way settings in FSS.
To train your models, make necessary changes in the configs of your choice. The configs of ZSS and FSS are in voc
for PASCAL VOC and in coco
for COCO. Additional ablation experiments can be run using config files in ablations
.
A generic command to run a paritcular experiment corresponding to a config is:
python3 train_net.py --config-file <config-path>
For example, to run WZSS on fold 3 of PASCAL VOC, run the following command:
python3 train_net.py --config-file ./configs/voc/wzss/fold0.yaml
You can provide command line arguments to modify certain config entries like this:
python3 train_net.py --config-file <config-path> --num-gpus 6 OUTPUT_DIR <output-path> MODEL.WEIGHTS <weight-init-path>
To train a model on a particular setting, you need to first run the learning prompts experiment and then run the zss experiment using the context vectors and weights learnt in the first part. For example, for PASCAL VOC fold 0, the commands are:
- Run Prompt Learning
python3 train_net.py --config-file ./configs/voc/prompt_learn/fold0.yaml OUTPUT_DIR <output-path>
- Add the path to the learnt weights in WZSS config here. Change the following entries accordingly:
MODEL: CLIP_ADAPTER: PROMPT_CHECKPOINT: <output-path>/model_final.pth REGION_CLIP_ADAPTER: PROMPT_CHECKPOINT: <output-path>/model_final.pth
- Run the WZSS experiment
python3 train_net.py --config-file ./configs/voc/wzss/fold0.yaml
By default, a model being trained for wzss gets evaluated at periodic intervals. To evaluate a trained model for wzss separately, simply run:
python3 train_net.py --config-file <config-path> --eval-only --resume
You may set the path to a trained model weight in command line like this:
python3 train_net.py --config-file <config-path> --eval-only MODEL.WEIGHTS <weight-path>
To evaluate WFSS, a model trained in the WZSS setting can be used directly. The command to run WFSS on PASCAL VOC fold 0 is:
python3 train_net.py --config-file ./configs/voc/wfss/fold0.yaml --eval-only MODEL.WEIGHTS <weight-path>
where <weight-path>
points to the path of the save model during WZSS training of fold 0 PASCAL VOC. Some care needs to be taken to make sure that fold 0 WZSS models are testing for fold 0 WFSS only.
Similarly, experiments for the 2-way setting can be run with the same trained model:
python3 train_net.py --config-file ./configs/voc/wfss_2way/fold0.yaml --eval-only MODEL.WEIGHTS <weight-path>
During evaluation, set the DATASETS.VIS_MULTIPLIER
entry to 1 in the corresponding config. Segmentation maps for seen and unseen categories will be generated in OUTPUT_DIR/evaluation/pred_vis
directory. Run visualize.py
after changing the dir_name
and vis_dir_name
accordingly.
During evaluation, set the DATASETS.VIS_MULTIPLIER
entry to 255 for 1-way and 127 for 2-way in the corresponding configs. Segmentation maps for unseen categories will be generated in OUTPUT_DIR/evaluation/pred_vis
directory. For 1-way, these will be binary maps and for 2-way these will have 0, 127 and 254 as the three distinct values.
We thank the authors of Maskformer
, CLIP
and Simple Baseline
for their awesome works. This repo benefits greatly from them.