PyTorch implementation of MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering (CVPR 2023).
This repository is based on VL-T5, the implementation on X-VLM can be found here.
We adopt PyTorch 1.10.0 and transformers 4.15.0. See requirements.txt for other requirements.
pip install -r requirements.txt
Please see data/README.md to prepare datasets.
├── data
│ ├── annotation
│ │ ├── answer_list.json
│ │ ├── gqa
│ │ │ ├── testdev.json
│ │ │ ├── train.json
│ │ │ ├── trainval_ans2label.json
│ │ │ ├── trainval_label2ans.json
│ │ │ └── valid.json
│ │ ├── lxmert_split
│ │ │ ├── minival.json
│ │ │ ├── nominival.json
│ │ │ ├── test.json
│ │ │ ├── train.json
│ │ │ └── val.json
│ │ ├── okvqa
│ │ │ ├── mscoco_train2014_annotations.json
│ │ │ ├── mscoco_val2014_annotations.json
│ │ │ ├── train.json
│ │ │ ├── trainval_ans2label.json
│ │ │ ├── trainval_label2ans.json
│ │ │ └── val.json
│ │ └── vqav2
│ │ ├── trainval_ans2label.json
│ │ ├── trainval_label2ans.json
│ │ ├── v2_mscoco_train2014_annotations.json
│ │ ├── v2_mscoco_val2014_annotations.json
│ │ └── val.json
│ ├── coco_imgfeat
│ │ ├── train_obj36.h5
│ │ └── val_obj36.h5
│ └── vg_imgfeat
│ │ ├── vg_gqa_obj36.h5
│ │ └── gqa_testdev_obj36.h5
- Experiments on OK-VQA dataset.
bash scripts/okvqa_vlt5_mixphm.sh $GPU_IDS $num_GPU
- Experiments on VQA v2 dataset.
bash scripts/vqav2_vlt5_mixphm.sh $GPU_IDS $num_GPU
- Experiments on GQA dataset.
bash scripts/gqa_vlt5_mixphm.sh $GPU_IDS $num_GPU
We acknowledge the use of the following public code in this project: VL-T5, Adapters, compacter, LoRA, AdaMix.