Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models".
Paper link: https://arxiv.org/abs/2406.05130
- Clone this repository
git clone https://github.com/alenai97/PEFT-MLLM.git
cd PEFT-MLLM
- Install dependencies
conda create -n peft-mllm python=3.10 -y
conda activate peft-mllm
pip install --upgrade pip
pip install -e.
pip install flash-attn --no-build-isolation
- Additional packages
PEFT
cd peft
pip install -e.
cd ..
transformers
cd transformers
pip install -e.
cd ..
- Data Preparation
Please download those datasets: ScienceQA, Flickr30K, IconQA, Vizwiz, OCRVQA, OKVQA and VQAv2, and organize them as follow in datasets
.
├── scienceqa
│ └── train
├── flickr30k
│ └── train
├── vizwiz
│ └── train
├── okvqa
│ └── train
├── ocrvqa
│ └── train
├── vqav2
│ └── train
└── iconqa
├── choose_txt
│ └── train
└── fill_in_blank
└── train
The data format please refer to LLaVA and Qwen-VL.
You can also follow the data format in datasets/scienceqa/train_sqa_llava.json
and datasets/scienceqa/train_sqa_qwen.json
.
- Start fine-tuning
You can find all the training scripts in scirpts
. For example, just start with scripts/llava/peft_lora.sh
.
For freeze the connector, please add:
--freeze_mm_mlp_adapter True
: for LLaVA-1.5 and ShareGPT4V.
--freeze_connector True
: for Qwen-VL-Chat.
The code of evaluation will be updated soon.
📅 Plan: We plan to implement additional PEFT methods for the MLLM community in the future.
@misc{zhou2024empiricalstudyparameterefficientfinetuning,
title={An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models},
author={Xiongtao Zhou and Jie He and Yuhua Ke and Guangyao Zhu and Víctor Gutiérrez-Basulto and Jeff Z. Pan},
year={2024},
eprint={2406.05130},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.05130},
}
Thanks for these outstanding works: LLaVA, Qwen-VL, ShareGPT4V, PEFT and LLM-Adapters.