What do we learn from inverting CLIP models? And what does a CLIP 'see' in an image?
Primary LanguagePython