Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP). [Paper] (https://arxiv.org/abs/2405.07284)
The goal of the project is to enhance the capabilities of the SAM (Segment Anything Model 1) model by incorporating text prompts using CLIP (Contrastive Language-Image Pretraining 2). This integration, known as SLIP (SAM with CLIP), aims to enable object segmentation without the need for prior training on specific classes or categories.
If you use this code or data in your research, please cite the following paper:
@misc{gundavarapu2024zero,
title={Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)},
author={Saaketh Koundinya Gundavarapu and Arushi Arora and Shreya Agarwal},
year={2024},
eprint={2405.07284},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
SLIP demo/
zero_shot_finetuned.ipynb
SLIP - Zero shot segmentation demo after finetuning CLIP.zero_shot_pretrained.ipynb
- SLIP - Zero shot segmentation demo using pretrained CLIP.
assests
- Contains images for plots, model architecture, and test images.baseline classifier/
classifier output/
ResNet18_pokemon_output
- text file - output after training ResNet18 on pokemon dataset.VGG_pokemon_output
- text file - output after training VGG on pokemon dataset.
models/
ResNet18.py
- ResNet18 model.VGG.py
- VGG model.
run_resnet.sbatch
- script to train ResNetrun_vgg.sbatch
- script to train vgg
evaluation/
ResNet_eval.ipynb
- ResNet evaluation on pokemon dataset.SLIP_segment_eval.ipynb
SLIP - Evalution of SLIP after finetuning CLIP, on pokemon dataset.make_evalutaion_dataset.py
Creates evaluation dataset.pokedex.csv
Contains information mapping image index to image class.pretrained_eval_segment.ipynb
SLIP - Evalution of SLIP using pretrained CLIP, on pokemon dataset.
finetuned CLIP/
captions.csv
- contains captions for CLIP finetuning.clip_grid_search.py
- Runs grid search on CLIP for hyperparameter tuning.clip_grid_search_output
- contains output after running gridsearch.convert_txt_to_csv.py
- converts captions text file to a csv file.generate_captions.py
- Generates captions for pokemon dataest.run.sbatch
- script for running grid search.
plots/
plot_resnet.ipynb
- plots for resnet.plot_CLIP.ipynb
- plots for CLIP.text_for_plot.txt
- best CLIP model output during grid search.
- Run the cells of the notebooks in
SLIP demo/
Model Architecture | Accuracy |
---|---|
SLIP - pretrained only | 0.15 |
SLIP - finetuned | 0.69 |
[1] Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A. C.; Lo, W.Y.; Doll ́ar, P.; and Girshick, R. 2023. Segment Anything. arXiv:2304.02643.
[2] Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; Krueger, G.; and Sutskever, I. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020.
[3] Contrastive Language-Image Pre-training
- Arushi Arora: aa10350@nyu.edu
- Saaketh Koundinya : sg7729@nyu.edu
- Shreya Agarwal : sa6981@nyu.edu