We benchmark VLLM for referring image captioning. From paper "Segment and Caption Anything"
Primary LanguagePython