Segment Anything Model (SAM) can produce high quality object masks from different types of prompts including points, boxes, masks and text. Unfortunately, the text prompt SAM model is not released. Therefore, we used a combination of SAM and CLIP to calculate the similarity between the output masks and text prompt. In this way, you can use text prompt to segment anything.
start a gradio service with the following scrip on local machine and you can try out our project with your own images.
python3 text_sam.py --checkpoint_path ../model/sam_vit_h_4b8939.pth model_type vit_h
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick. Segment Anything.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8748-8763, 2021. CLIP