An ONNX-based implementation of CLIP that doesn't
depend on torch
or torchvision
.
It also has a friendlier API than the original implementation.
This works by
- running the text and vision encoders (the ViT-B/32 variant) in ONNX Runtime
- using a pure NumPy version of the tokenizer
- using a pure NumPy+PIL version of the preprocess function.
The PIL dependency could also be removed with minimal code changes - see
preprocessor.py
.
To install, run the following in the root of the repository:
pip install .
All you need to do is call the OnnxClip
model class. An example:
from onnx_clip import OnnxClip, softmax, get_similarity_scores
from PIL import Image
images = [Image.open("onnx_clip/data/franz-kafka.jpg").convert("RGB")]
texts = ["a photo of a man", "a photo of a woman"]
# Your images/texts will get split into batches of this size before being
# passed to CLIP, to limit memory usage
onnx_model = OnnxClip(batch_size=16)
# Unlike the original CLIP, there is no need to run tokenization/preprocessing
# separately - simply run get_image_embeddings directly on PIL images/NumPy
# arrays, and run get_text_embeddings directly on strings.
image_embeddings = onnx_model.get_image_embeddings(images)
text_embeddings = onnx_model.get_text_embeddings(texts)
# To use the embeddings for zero-shot classification, you can use these two
# functions. Here we run on a single image, but any number is supported.
logits = get_similarity_scores(image_embeddings, text_embeddings)
probabilities = softmax(logits)
print("Logits:", logits)
for text, p in zip(texts, probabilities[0]):
print(f"Probability that the image is '{text}': {p:.3f}")
Note: The following may give timeout errors due to the filesizes. If so, this can be fixed with poetry version 1.1.13 - see this related issue.
Install Poetry
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
To setup the project and create a virtual environment run the following command from the project's root directory.
poetry install
To build a source and wheel distribution of the library run the following command from the project's root directory.
poetry build
First, remove/move the downloaded LFS files, so that they're not packaged with the code.
Otherwise, this creates a huge .whl
file that PyPI refuses and it causes confusing errors.
Then, follow this guide.
tl;dr: go to the PyPI account page, generate an API token
and put it into the $PYPI_PASSWORD
environment variable. Then run
poetry publish --build --username lakera --password $PYPI_PASSWORD
Please let us know how we can support you: earlyaccess@lakera.ai.
See the LICENSE file in this repository.
The franz-kafka.jpg
is taken from here.