/iglue

[ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"

Primary LanguageShellMIT LicenseMIT

IGLUE: The Image-Grounded Language Understanding Evaluation Benchmark

This is the implementation of the approaches described in the paper:

Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, Ivan Vulić. IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages. In Proceedings of the 39th International Conference on Machine Learning, Jul 2022.

We provide the code for reproducing our results, preprocessed data and pretrained models.

IGLUE models and tasks will also be integrated into VOLTA, upon which our repository was origally built.

Repository Setup

To set the environment to reproduce our results, see "Repository Setup" in the VOLTA's README.

Data

datasets/ contains the textual data for each dataset.

Check out its README for links to preprocessed data

Features extraction steps for each of dataset and backbone can be found under features_extraction/.

Models

The checkpoints of all our V&L models can be downloaded from ERDA:

For more details on defining new models in VOLTA, see volta/MODELS.md.

Model configuration files are stored in volta/config/.

Training and Evaluation

We provide the scripts we used to train and evaluate models in experiments/:

  • zero_shot/: English fine-tuning and zero-shot/`translate test' evaluation
  • few_shot/: Few-shot experiments for each dataset-language-shots triplet
  • few_shot.dev-mt/: Few-shot experiments when using dev sets in the target languages (MT)
  • translate_train.de/: `Translate train' experiments on xFLickr&CO in German
  • translate_train.ja/: `Translate train' experiments on xFLickr&CO in Japanese

Task configuration files are stored in config_tasks/.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:

@inproceedings{bugliarello-etal-2022-iglue,
  title = 	 {{IGLUE}: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages},
  author =       {Bugliarello, Emanuele and Liu, Fangyu and Pfeiffer, Jonas and Reddy, Siva and Elliott, Desmond and Ponti, Edoardo Maria and Vuli{\'c}, Ivan},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {2370--2392},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/bugliarello22a/bugliarello22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/bugliarello22a.html},
}