/vclip-model

Variational CLIP

Primary LanguagePython

VCLIP is an extension of OpenAI's CLIP for variational inference. It was fine-tuned on a subset of Conceptual Captions. This repo contains the simple implementation code and a link to the weights. The implementation is an extension of HuggingFace's FlaxCLIPModel.

Pretrained weights (Google Cloud Storage).


VCLIP computes a Gaussian distribution over images for each prompt, rather than returning a single point. The similarity function between (text, img) is the normal probability density function rather than cosine similarity.

CLIP and VCLIP left: CLIP, right: VCLIP