☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause