/villa

ViLLA: Fine-grained vision-language representation learning from real-world data

Primary LanguagePythonMIT LicenseMIT

Watchers