/IKD-MMT

Our code for EMNLP'22 Oral paper "Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation".

Primary LanguagePythonMIT LicenseMIT

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation [Paper]

Forewords

I deeply apologize for any inconvenience the previous version may have caused you in terms of usability. Although I am busy, this does not seem to be a good excuse 😥. The current version has more streamlined code and more detailed usage introduction, please enjoy it.

Step1: Requirements

  • Build running environment (two ways)
  1. pip install --editable .  
  2. python setup.py build_ext --inplace
  • pytorch==1.7.0, torchvision==0.8.0, cudatoolkit=10.1 (pip install is also work)
  conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.1 -c pytorch 

Step2: Data Preparation

The dataset used in this work is Multi30K, both its original and preprocessed versions (that I used) are available at here.

You can download your own data set and then refer to experiments/prepare-iwslt14.sh or experiments/prepare-wmt14en2de.sh to pre-process the data set.

File Name Description Download
resnet50-avgpool.npy pre-extracted image features, each image is represented as a 2048-dimensional vector. Link
Multi30K EN-DE Task BPE+TOK text, Image Index, Label for English-German task (including train, val, test2016/17/mscoco) Link
Multi30K EN-FR Task BPE+TOK text, Image Index, Label for English-French task (including train, val, test2016/17/mscoco) Link

Step3: Running code

You can let this code works by run the scripts in the directory expriments.

  1. preprocess dataset into torch type

    bash pre.sh
  2. train model

    bash train_res_m_l2.sh
  3. generate target sentence

    bash gen_res_m_l2.sh

Citation

If you use the code in your research, please cite:

@inproceedings{peng-etal-2022-distill,
    title = "Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation",
    author = "Peng, Ru  and
      Zeng, Yawen  and
      Zhao, Jake",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    year = "2022",
    pages = "2379--2390",
}