/PEFT-MLLM

Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"

Primary LanguagePython

PEFT-MLLM

Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models".

Paper link: https://arxiv.org/abs/2406.05130

Install

  1. Clone this repository
git clone https://github.com/alenai97/PEFT-MLLM.git
cd PEFT-MLLM
  1. Install dependencies
conda create -n peft-mllm python=3.10 -y
conda activate peft-mllm
pip install --upgrade pip
pip install -e.
pip install flash-attn --no-build-isolation
  1. Additional packages

PEFT

cd peft
pip install -e.
cd ..

transformers

cd transformers
pip install -e.
cd ..

Train

  1. Data Preparation

Please download those datasets: ScienceQA, Flickr30K, IconQA, Vizwiz, OCRVQA, OKVQA and VQAv2, and organize them as follow in datasets.

├── scienceqa
│   └── train
├── flickr30k
│   └── train
├── vizwiz
│   └── train
├── okvqa
│   └── train
├── ocrvqa
│   └── train
├── vqav2
│   └── train
└── iconqa
    ├── choose_txt
    │   └── train
    └── fill_in_blank
        └── train

The data format please refer to LLaVA and Qwen-VL.

You can also follow the data format in datasets/scienceqa/train_sqa_llava.json and datasets/scienceqa/train_sqa_qwen.json.

  1. Start fine-tuning

You can find all the training scripts in scirpts. For example, just start with scripts/llava/peft_lora.sh.

For freeze the connector, please add:

--freeze_mm_mlp_adapter True: for LLaVA-1.5 and ShareGPT4V.

--freeze_connector True: for Qwen-VL-Chat.

Evaluation

The code of evaluation will be updated soon.

📅 Plan: We plan to implement additional PEFT methods for the MLLM community in the future.

Citation

@misc{zhou2024empiricalstudyparameterefficientfinetuning,
      title={An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models}, 
      author={Xiongtao Zhou and Jie He and Yuhua Ke and Guangyao Zhu and Víctor Gutiérrez-Basulto and Jeff Z. Pan},
      year={2024},
      eprint={2406.05130},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.05130}, 
}

Acknowledgement

Thanks for these outstanding works: LLaVA, Qwen-VL, ShareGPT4V, PEFT and LLM-Adapters.