/td-mml

Learning multimodal multilingual models with machine translated data

Primary LanguageShellMIT LicenseMIT

TD-MML: Translated Data for Multilingual Multimodal Learning

This repository contains the implementation for the paper:

Chen Qiu, Dan Oneață, Emanuele Bugliarello, Stella Frank, Desmond Elliott. Multilingual Multimodal Learning with Machine Translated Text. EMNLP, 2022.

The paper is available on arXiv at the following link.

Setup

This repository is a fork of the IGLUE codebase, which in turn depends on VOLTA. To set up the environment, please follow the instructions listed in the VOLTA README.

Data

The machine translated data corresponding to the Conceptual Captions dataset can be downloaded from here. The Conceptual Captions datasets contains 2.77M English sentences gathered from web-crawled alt-text and post-processed to remove proper names. We translated those sentences using the large M2M-100 model (with 1.2B parameters) into the twenty languages of the IGLUE benchmark. Since we have observed that the quality translations varies across languages, we have applied an automatic filtering procedure to discard poor sentences (see the paper for more details); the provided data contains the filtered translations.

We also provide translations for two of the IGLUE tasks in two variants (filtered and full):

The code to generate the translations is available in volta/data/conceptual_captions; see the corresponding README.

The visual features are the same as those used in IGLUE; see the extraction steps for each of dataset and backbone under features_extraction/.

Models

The checkpoints of all the pretrained TD-MML model will be made available shortly.

For more details on defining new models in VOLTA, see volta/MODELS.md.

Model configuration files are stored in volta/config/.

Training and Evaluation

We provide the scripts we used to train and evaluate models in experiments/:

  • zero_shot/: English fine-tuning and zero-shot/`translate test' evaluation
  • few_shot/: Few-shot experiments for each dataset-language-shots triplet

Task configuration files are stored in config_tasks/.

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data are subject to their respective licenses.

If you find our code/data/models or ideas useful in your research, please consider citing the paper.