kweonwooj/papers

Data Augmentation for Low-Resource Neural Machine Translation

Opened this issue · 0 comments

Abstract

  • Propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts
  • Experimental result on simulated low-resource setting for En-De/De-En shows ~3.0 BLEU improvement over back-translation

Details

  • Translation Data Augmentation

    • Switching a word in both source and target sentences
      screen shot 2018-01-16 at 1 38 26 pm
    • Improper word switching is eliminated by Language Model
      screen shot 2018-01-16 at 1 39 04 pm
    • Word to be switched is chosen via LM
    • Location of word to be switched is chosen via automatic word alignments trained over the bitext (fastAlign)
  • Result

    • Better BLEU score than back-translation, but margin is not significantly different
      screen shot 2018-01-16 at 1 40 54 pm

Personal Thoughts

  • You need LM and aligner to augment data
  • augmentation focuses on rare words only, no diversity in sentence/semantics supported
  • Not a good NLP augmentation method...

Link : https://arxiv.org/pdf/1705.00440.pdf
Authors : Fadaee et al. 2017