/vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Primary LanguagePythonMIT LicenseMIT

Watchers