IGLUE: The Image-Grounded Language Understanding Evaluation Benchmark

This is the implementation of the approaches described in the paper:

Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, Ivan Vulić. IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages. In Proceedings of the 39th International Conference on Machine Learning, Jul 2022.

We provide the code for reproducing our results, preprocessed data and pretrained models.

IGLUE models and tasks will also be integrated into VOLTA, upon which our repository was origally built.

Repository Setup

To set the environment to reproduce our results, see "Repository Setup" in the VOLTA's README.

Data

datasets/ contains the textual data for each dataset.

Check out its README for links to preprocessed data

Features extraction steps for each of dataset and backbone can be found under features_extraction/.

Models

The checkpoints of all our V&L models can be downloaded from ERDA:

For more details on defining new models in VOLTA, see volta/MODELS.md.

Model configuration files are stored in volta/config/.

Training and Evaluation

We provide the scripts we used to train and evaluate models in experiments/:

zero_shot/: English fine-tuning and zero-shot/`translate test' evaluation
few_shot/: Few-shot experiments for each dataset-language-shots triplet
few_shot.dev-mt/: Few-shot experiments when using dev sets in the target languages (MT)
translate_train.de/: `Translate train' experiments on xFLickr&CO in German
translate_train.ja/: `Translate train' experiments on xFLickr&CO in Japanese

Task configuration files are stored in config_tasks/.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper: