/Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Primary LanguagePython

Watchers