/SoM-LLaVA

Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.

Primary LanguagePython

Issues