/CLIP-text-image-interpretability

Get CLIP ViT text tokens about an image, visualize attention as a heatmap.

Primary LanguagePython

Stargazers