/namedmask

[CVPRW'23] The official PyTorch implementation of NamedMask

Primary LanguagePythonMIT LicenseMIT

NamedMask: Distilling Segmenters from Complementary Foundation Models

Official PyTorch implementation for NamedMask. Details can be found in the paper. [paper] [poster] [project page]

Alt Text

Contents

Demo

Please find our demo built with Hugging Face and Gradio.

Preparation

1. Download datasets

Please download datasets of interest first by visiting the following links:

It is worth noting that Cityscapes and ImageNet2012 require you to sign up an account. In addition, you need to download ImageNet2012 if you want to train NamedMask yourself.

We advise you to put the downloaded dataset(s) into the following directory structure for ease of implementation:

{your_dataset_directory}
├──cityscapes
│  ├──gtFine
│  ├──leftImg8bit
├──coca
│  ├──binary
│  ├──image
├──coco2017
│  ├──annotations
│  ├──train2017
│  ├──val2017
├──ImageNet2012
│  ├──train
│  ├──val
├──ImageNet-S
│  ├──ImageNetS50
│  ├──ImageNetS300
│  ├──ImageNetS919
├──VOCdevkit
   ├──VOC2012

2. Download required python packages:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge tqdm
conda install -c conda-forge matplotlib
conda install -c anaconda ujson
conda install -c conda-forge pyyaml
conda install -c conda-forge pycocotools
conda install -c anaconda scipy
pip install opencv-python
pip install git+https://github.com/openai/CLIP.git

Please note that a required version of each package might vary depending on your local device.

NamedMask training/inference

NamedMask is trained with pseudo-labels from either an unsupervised saliency detector (e.g., SelfMask) or category experts which refines the predictions made by the saliency network. For this reason, we need to generate pseudo-labels before training NamedMask. You can skip this part if you only want to do inference with pre-trained weights provided below.

1. Generate pseudo-labels

To compute pseudo-masks for images of the categories in Cityscapes, COCO2017, CoCA, or VOC2012, we provide for each benchmark a dictionary file (.json format) which maps a category to a list of 500 ImageNet2012 image paths which are retrieved by CLIP (with ViT-L/14@336px architecture). This file has the following structure:

{
    "category_a": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
    "category_b": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
    ...
}

You need to change {your_imagenet_dir} before loading this file for the following steps (by default, it's set to /home/cs-shin1/datasets/ImageNet2012).

Please download a dictionary file for a benchmark on which you want to evaluate and put it in the ImageNet2012 directory:

Then, open selfmask.sh in scripts directory and change

DIR_ROOT={your_working_directory}
DIR_DATASET={your_ImageNet2012_directory}
CATEGORY_TO_P_IMAGES_FP={your_category_to_p_images_fp}  # this should point to a json file you downloaded above

Run,

bash selfmask.sh

This will generate pseudo-masks for images retrieved by CLIP (with ViT-L/14@336px architecture) from the ImageNet2012 training set. The pseudo-masks will be saved at {your_ImageNet2012_directory}/train_pseudo_masks_selfmask.

If you want to skip this process, please download the pre-computed pseudo-masks and uncompress it in {your_ImageNet2012_directory}/train_pseudo_masks_selfmask:

Optionally, if you want to refine pseudo-masks with a category expert (after finishing the above step), check out expert_$DATASET_NAME_category.sh file and configure DIR_ROOT, CATEGORY_TO_P_IMAGES_FP and CATEGORY_TO_P_IMAGES_FP as appropriate. Then,

bash expert_$DATASET_NAME_category.sh

Currently, we only provide code for training experts of the VOC2012 categories. The pseudo-masks will be saved at {your_ImageNet2012_directory}/train_pseudo_masks_experts.

If you want to skip this process, please download the pre-computed pseudo-masks:

Please uncompress .zip file in {your_ImageNet2012_directory}/train_pseudo_masks_experts.

2. Training

Once pseudo-masks are created (or downloaded and uncompressed), set a path to the directory that contains the pseudo-masks in a configuration file. For example, to train a model with pseudo-masks from experts for the VOC2012 categories, open the voc_val_n500_cp2_ex.yaml file and change

category_to_p_images_fp: {your_category_to_p_images_fp}  # this should point to a json file you downloaded above
dir_ckpt: {your_dir_ckpt}  # this should point to a checkpoint directory
dir_train_dataset: {your_dir_train_dataset}  # this should point to ImageNet2012 directory (as an index dataset)
dir_val_dataset: {your_dir_val_dataset}  # this should point to a benchmark directory

arguments as appropriate.

Then, run

bash voc_val_n500_cp2_sr10100_ex.sh

It is worth noting that an evaluation will be made at every certain iterations during training and the final weights will be saved at your checkpoint directory.

3. Inference

To evaluate a model with pre-trained weights on a benchmark, e.g., VOC2012, please run (after customising the four arguments above)

bash voc_val_n500_cp2_sr10100_ex.sh $PATH_TO_WEIGHTS

Pre-trained weights

We provide the pre-trained weights of NamedMask:

benchmark split IoU (%) pixel accuracy (%) link
Cityscapes (object) val 18.2 93.0 weights (~102 MB)
COCA - 27.4 82.0 weights (~102 MB)
COCO2017 val 27.7 76.4 weights (~102 MB)
ImageNet-S50 test 47.5 - weights (~102 MB)
ImageNet-S300 test 33.1 - weights (~103 MB)
ImageNet-S919 test 23.1 - weights (~103 MB)
VOC2012 val 59.3 89.2 weights (~102 MB)

We additionally offer the pre-trained weights of the category experts for 20 classes in VOC2012:

category link
aeroplane weights (~102 MB)
bicycle weights (~102 MB)
bird weights (~102 MB)
boat weights (~102 MB)
bottle weights (~102 MB)
bus weights (~102 MB)
car weights (~102 MB)
cat weights (~102 MB)
chair weights (~102 MB)
cow weights (~102 MB)
dining table weights (~102 MB)
dog weights (~102 MB)
horse weights (~102 MB)
motorbike weights (~102 MB)
person weights (~102 MB)
potted plant weights (~102 MB)
sheep weights (~102 MB)
sofa weights (~102 MB)
train weights (~102 MB)
tv/monitor weights (~102 MB)

Citation

@inproceedings{shin2023namedmask,
  title = {NamedMask: Distilling Segmenters from Complementary Foundation Models},
  author = {Shin, Gyungin and Xie, Weidi and Albanie, Samuel},
  booktitle = {CVPRW},
  year = {2023}
}

Acknowledgements

We borrowed the code for SelfMask and DeepLabv3+ from

If you have any questions about our code/implementation, please contact us at gyungin [at] robots [dot] ox [dot] ac [dot] uk.