/Mask-free-OVIS

Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]

Primary LanguagePython

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]

Framework: PyTorch

[Project Page] [arXiv] [PDF] [Suppli] [Slides] [BibTeX]

Contributions

  • Utilize a pre-trained Vision-Language model to identify and localize the object of interest using GradCAM.
  • Employ a weakly-supervised proposal generator to generate bounding box proposals and select the proposal with the highest overlap with the GradCAM map.
  • Crop the image based on the selected proposal and leverage the GradCAM map as a weak prompt to extract a mask using a weakly-supervised segmentation network.
  • Using these generated masks, train an instance segmentation model (Mask-RCNN) eliminating the need for human-provided box-level or pixel-level annotations.

Environment

UBUNTU="18.04"
CUDA="11.0"
CUDNN="8"

Pseudo-mask Generator Pipeline

Installation

conda create --name pseduo_mask_gen

conda activate pseduo_mask_gen

bash pseduo_mask_gen.sh

Preparation

Generate Pseudo-mask

  • Get pseudo labels based on ALBEF and generated pseudo-mask are available here
python pseudo_mask_generator.py
  • Organize dataset in COCO format
python prepare_coco_dataset.py
  • Extract text embedding using CLIP
# pip install git+https://github.com/openai/CLIP.git

python prepare_clip_embedding_for_open_vocab.py
  • Check your final pseudo-mask by visualization
python visualize_coco_style_dataset.py

Pseudo-mask Training Pipeline

Installation

conda create --name maskfree_ovis
conda activate maskfree_ovis
cd $INSTALL_DIR
bash ovis.sh

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

cd ../
cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
python setup.py build develop

Data Preparation

Pretrain with Pseudo-Labels

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/pretrain_pseduo_mask.yaml OUTPUT_DIR $OUTPUT_DIR

Finetune

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/finetune.yaml MODEL.WEIGHT $PATH_TO_PRETRAIN_MODEL  OUTPUT_DIR $OUTPUT_DIR

Inference

'Coming Soon...!!!'

Citation

If you found Mask-free OVIS useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{vs2023mask,
  title={Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations},
  author={VS, Vibashan and Yu, Ning and Xing, Chen and Qin, Can and Gao, Mingfei and Niebles, Juan Carlos and Patel, Vishal M and Xu, Ran},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={23539--23549},
  year={2023}
}

Acknowledgement

The codebase is build on top of PB-OVD, CMPL and Wetectron.