kweonwooj/papers

Guiding Neural Machine Translation with Retrieved Translation Pieces

Opened this issue · 0 comments

Abstract

  • propose effective method to incorporate out-of-the-box sentence pairs during NMT decoding process
    • use a search engine to retrieve similar sentence pairs
    • collect n-gram translation pieces from target side where similarity and alignment score is high
    • reward translation pieces during NMT decoding process
  • +6.0 BLEU improvement in narrow domain translation task
  • effective algorithm design enables accuracy, speed and simplicity of implementation

Details

  • Problem
    • NMT is weak at translating low-frequency words or phrases
  • Retrieval-based Model
    • an active research area where NMT retrieves sentence pairs from training corpus during translation
    • it augments parametric NMT model with a non-parametric translation memory that allows for increased capacity
    • Two main approaches
      • Li et al 2016 and Farajian et al 2017 use the retrieved sentence pairs to fine tune the parameters of the NMT model
      • Gu et al 2017 uses the retrieved sentence pairs as additional inputs to the NMT decoding
  • Contribution
    • existing methods perform well, but add significant complexity and computational/memory cost to the decoding process
    • propose a simple and efficient method that collects n-gram in the retrieved target sentences (translation pieces), calculate a pseudo-probability to weight the translation pieces and reward NMT to output translation pieces during beam search decoding process

Guiding NMT with Translation Pieces

screen shot 2018-04-17 at 6 39 12 pm

  1. use Lucene search engine to retrieve M source sentences that have n-gram similarity
  2. among all n-grams in retrieved target sentences, collect translation pieces and score them according to the similarity between input sentence and retrieved source sentence:
    screen shot 2018-04-17 at 6 25 51 pm
  3. in beam search decoding process, translation pieces are given rewards
    screen shot 2018-04-17 at 6 31 24 pm
    screen shot 2018-04-17 at 6 31 32 pm
  4. reward process is implemented efficiently such that it does not traverse over all target vocabilary V, but only traverse target words that belong to translation pieces
    screen shot 2018-04-17 at 6 31 59 pm

Experiments

  • corpus : JRC-Acquis corpus. 670k sentences with narrow domain
  • result : +6.0 BLEU score over baseline NMT
    screen shot 2018-04-17 at 6 35 11 pm

Ablation Experiments

  • Effect of look-up corpus

    • similarity between test set and look-up corpus is an important factor in performance of Guided NMT
    • in WMT17 EnDe News Translation task, this method does not achieve significant improvements over the baseline due to difference in train and test data distribution, as shown in Table 8 where WMT's similarity distribution is focused on 0.2~0.4
      screen shot 2018-04-17 at 6 38 45 pm
  • Infrequent n-grams

    • Guided NMT extracts infrequent n-grams (unique count < 5) in decoded sentence than baseline NMT. This shows that algorithm meets the original motivation well.
    • if there is no count in look-up corpus, no improvement is seen because no reward is added
      screen shot 2018-04-17 at 6 39 51 pm
  • vs Search Engine Guided NMT by Gu et al 2017

    • our approach performs better and is faster during decoding
    • SGNMT requires encoding/decoding of retrieved sentence pairs, which is costly
      screen shot 2018-04-17 at 6 42 04 pm

Personal Thoughts

  • well written paper, well experimented, in-depth analysis
  • Algorithm 2 is an efficient method to reward/punish n-gram outputs in beam search
  • consideration on practical implementation was impressive
  • hope infrequent n-grams can be dealt well for general purpose translation task (WMT News task)

Link : https://arxiv.org/pdf/1804.02559v1.pdf
Authors : Zhang et al. 2018