/executor-image-clip-encoder

CLIPImageEncoder is an image encoder that wraps the image embedding functionality using the CLIP

Primary LanguagePython

CLIPImageEncoder

CLIPImageEncoder is an image encoder that wraps the image embedding functionality using the CLIP model from huggingface transformers. This encoder is meant to be used in conjunction with the CLIPTextEncoder, as it can embed text and images to the same latent space.

For more information on the gpu usage and volume mounting, please refer to the documentation. For more information on CLIP model, please checkout the blog post, paper and hugging face documentation