Woodpecker: Hallucination Correction for MLLMs 🔧

Online Demo | Paper [Coming Soon]

This is the first work to correct hallucination in multimodal large language models. If you have any question, please feel free to email bradyfu24@gmail.com or add weChat ID xjtupanda.

News

[09-26] We release our code and the online demo. The paper will be coming soon! 🔥🔥🔥

Demo

Please feel free to try our Online Demo!

Preliminary

Create conda environment

conda create -n corrector python=3.10
conda activate 
pip -r requirements.txt

Install required packages and models

Install spacy and relevant model packages, following the instructions in Link. This is used for some text processing operations.

pip install -U spacy
python -m spacy download en_core_web_lg
python -m spacy download en_core_web_md
python -m spacy download en_core_web_sm

For our Open-set Detector. Install GroundingDINO following the instructions in Link.

Usage

1. Inference

To make corrections based on an image and a text output from MLLM, run the inference code as follows:

python inference.py \
        --image-path {path/to/image} \
        --text "Some text to be corrected." \
        --detector-config "path/to/GroundingDINO_SwinT_OGC.py" \
        --detector-model "path/to/groundingdino_swint_ogc.pth" \
        --api-key "sk-xxxxxxx" \

The output text will be printed in the terminal, and intermediate results saved by default as ./intermediate_view.json.

2. Demo setup

We use mPLUG-Owl as our default MLLM in experiments. If you wish to replicate the online demo, please clone the project and modify the variables in https://github.com/BradyFU/Hallucination-Correction-for-MLLMs/blob/c6826d82785828673064d73d9722fd71214f4b3c/gradio_demo.py#L7 and https://github.com/BradyFU/Hallucination-Correction-for-MLLMs/blob/c6826d82785828673064d73d9722fd71214f4b3c/gradio_demo.py#L35-L36

Then simply run:

CUDA_VISIBLE_DEVICES=0,1 python gradio_demo.py

Here we put the corrector components on GPU with id 0 and mPLUG-Owl on GPU with id 1.

Acknowledgement