ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
Primary LanguagePythonMIT LicenseMIT
No one’s watching this repository yet.