kweonwooj/papers

Data Augmentation for Low-Resource Neural Machine Translation

Opened this issue 7 years ago · 0 comments

kweonwooj commented 7 years ago

Abstract

Propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts
Experimental result on simulated low-resource setting for En-De/De-En shows ~3.0 BLEU improvement over back-translation

Details

Translation Data Augmentation
- Switching a word in both source and target sentences
- Improper word switching is eliminated by Language Model
- Word to be switched is chosen via LM
- Location of word to be switched is chosen via automatic word alignments trained over the bitext (fastAlign)
Result
- Better BLEU score than back-translation, but margin is not significantly different

Personal Thoughts

You need LM and aligner to augment data
augmentation focuses on rare words only, no diversity in sentence/semantics supported
Not a good NLP augmentation method...

Link : https://arxiv.org/pdf/1705.00440.pdf
Authors : Fadaee et al. 2017