/vision_and_language_final

Primary LanguagePythonApache License 2.0Apache-2.0

This repository reproduce VQA experiments of OFA model.

As the original paper, we use the same dataset and evaluation metric as VQA v2.0. The dataset can be downloaded from VQA v2.0. The evaluation metric from VQA v2.0. However, their preprocessing method results in a more than 100 GB zip file, which is difficult to download and unzip. I use the chunked version in OFA-Sys/OFA#68 (comment)

bash download.sh
cd dataset/vqa_data
cat vqa_train_* > vqa_train.tsv
cat vqa_test_* > vqa_test.tsv

Environment

export PYTHONPATH=$PYTHONPATH:/data/hzz5361/vision_and_lang/final/OFA/fairseq
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Evaluation

Download Pretrained Model

The allcand evaluation is very demanding on GPU memory. As my RTX A5000 only has 24GB memory, I can only run the allcand evaluation with batch size 4.

cd run_scripts/vqa
bash evaluate_vqa_beam_base.sh val
bash evaluate_vqa_base_allcand.sh val

Results

TaskImage CaptioningVQAVisual EntailmentReferring Expression Comprehension
DatasetCOCOVQA v2SNLI-VERefCOCORefCOCO+RefCOCOg
SplitKarpathy test (CE/CIDEr)test-dev/test-stdval/testval/test-a/test-bval/test-a/test-bval-u/test-u
MetricCIDErAcc.Acc.Acc.
OFATiny119.0 / 128.770.3 / 70.485.3 / 85.280.20 / 84.07 / 75.0068.22 / 75.13 / 57.6672.02 / 69.74
OFAMedium130.4 / 140.375.4 / 75.586.6 / 87.085.34 / 87.68 / 77.9276.09 / 83.04 / 66.2578.76 / 78.58
OFABase138.2 / 146.778.0 / 78.189.3 / 89.288.48 / 90.67 / 83.3081.39 / 87.15 / 74.2982.29 / 82.31
OFALarge142.2 / 150.780.4 / 80.790.3 / 90.290.05 / 92.93 / 85.2685.80 / 89.87 / 79.2285.89 / 86.55
OFAHuge145.3 / 154.982.0 / 82.091.0 / 91.292.04 / 94.03 / 88.4487.86 / 91.70 / 80.7188.07 / 88.78
OFABase138.2 / 146.778.0 / 78.189.3 / 89.288.48 / 90.67 / 83.3081.39 / 87.15 / 74.2982.29 / 82.31
OFABase-Beam-3 77.94/-
OFABase-Beam-10 77.56/-