We introduce 3AM: an ambiguity-aware multimodal machine translation dataset with ~26K image-text pairs for multimodal machine translation. Compared with previous MMT datasets, our dataset encompasses a greater diversity of caption styles and a wider range of visual concepts. Please check out our paper for more details.
Please download the dataset at here. The text data is also available at the data
folder.
The code for training the selective attention model is available here, which is base on fairseq-mmt.
# train
bash train_mmt.sh
# test
bash translate_mmt.sh
The code for training the VL-Bart and VL-T5 model is available here, which is based on VL-T5.
# VL-Bart
bash scripts/MMT_VLBart.sh
# VL-T5
bash scripts/MMT_VLT5.sh
If you have any questions, please email yc27434@umac.mo.
If you use this dataset in your research, please cite:
TODO