This repository contains the implementation for the paper:
Chen Qiu, Dan Oneață, Emanuele Bugliarello, Stella Frank, Desmond Elliott. Multilingual Multimodal Learning with Machine Translated Text. EMNLP, 2022.
The paper is available on arXiv at the following link.
This repository is a fork of the IGLUE codebase, which in turn depends on VOLTA. To set up the environment, please follow the instructions listed in the VOLTA README.
The machine translated data corresponding to the Conceptual Captions dataset can be downloaded from here. The Conceptual Captions datasets contains 2.77M English sentences gathered from web-crawled alt-text and post-processed to remove proper names. We translated those sentences using the large M2M-100 model (with 1.2B parameters) into the twenty languages of the IGLUE benchmark. Since we have observed that the quality translations varies across languages, we have applied an automatic filtering procedure to discard poor sentences (see the paper for more details); the provided data contains the filtered translations.
We also provide translations for two of the IGLUE tasks in two variants (filtered and full):
- MaRVL (based on the NLVR2 dataset): filtered translations · full translations
- xGQA (based on the GQA dataset): filtered translations · full translations
The code to generate the translations is available in volta/data/conceptual_captions
;
see the corresponding README.
The visual features are the same as those used in IGLUE;
see the extraction steps for each of dataset and backbone under features_extraction/
.
The checkpoints of all the pretrained TD-MML model will be made available shortly.
For more details on defining new models in VOLTA, see volta/MODELS.md
.
Model configuration files are stored in volta/config/
.
We provide the scripts we used to train and evaluate models in experiments/
:
zero_shot/
: English fine-tuning and zero-shot/`translate test' evaluationfew_shot/
: Few-shot experiments for each dataset-language-shots triplet
Task configuration files are stored in config_tasks/.
This work is licensed under the MIT license. See LICENSE
for details.
Third-party software and data are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper.