/VTprompt

The code for paper:Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models

VTprompt

This repository contains the code for the paper: "Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models". image

Installation

Environment Setup

Please follow the instructions in Grounded Segment Anything to set up the environment.

Usage

  1. Building Vprompt
  2. Using Tprompt to prompt Multimodal Large Language Models for generating answers.

Evaluation Code and Usage Tutorial

We are currently in the process of organizing detailed evaluation code and usage tutorials. Please stay tuned for updates!