(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
Downloading llama2-7B from https://huggingface.co/meta-llama/Llama-2-7b
# download features from https://drive.google.com/file/d/1_3UtErJzCcqpJ3Ie0l7uSvKUuj8H201t/view?usp=sharing
tar -xf features.tar
cd dataset
download MMC dataset from https://drive.google.com/file/d/1S_9ktojv3DPr7V26BtR1uoQkMXvGUBw3/view?usp=sharing
Please email me if you require the CC3M dataset.
# train with 4 gpus
./train.sh
cd repos
git clone https://github.com/miccunifi/CIRCO.git
git clone git clone -b cirr_dataset git@github.com:Cuberick-Orion/CIRR.git cirr
# download pretrained models
cd runs
download checkpoint from https://drive.google.com/file/d/1oYfffZ6ckVjSbxG9rk0h5OACYfEWIWxv/view?usp=sharing
tar -xf icml_run.tar
# download data
cd data
download MSCOCO unlabel images from https://cocodataset.org/#download
download NLVR2 from https://huggingface.co/datasets/TIGER-Lab/NLVR2
# CIR inference
cir_inference.ipynb
This repo is built upon FROMAGe.
@inproceedings{li2024improving,
title={Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning},
author={Li, Wei and Fan, Hehe and Wong, Yongkang and Yang, Yi and Kankanhalli, Mohan},
journal={ICML},
year={2024},
}