Evaluating CLIP's cross-modal grounding using explainability methods.
Primary LanguageJupyter NotebookMIT LicenseMIT