CLIP Streamlit Demo
This tutorial shows how to use CLIP inside streamlit. I have used the validation text from the coco dataset (and discarded the images) from which I choose the closest caption. Note that I am not trying to generate text from the image.
Method and Clip model
CLIP by OpenAI is simply using the dot product between a text embedding and an image embedding. Given the text embeddings from the coco dataset which I precalculate and download from dropbox, I find the closest sentences to the given image.
Simply run streamlit run main.py
to open this in your browser.