MAPLE: MAchine translation dataset for Preference LEarning

We build MAPLE (MAchine translation dataset for Preference LEarning), a dataset derived from WMT20/21 test sets. It contains multiple translations per source sentence, each assigned a real-valued human preference score. MAPLE covers four translation directions: German-to-English (de→en), Chinese-to-English (zh→en), English-to-German (en→de), and English-to-Chinese (en→zh). For each direction, 1.1K source sentences are sampled from the test sets of WMT20/21. Each source sentence is associated with five translations, including one reference translation from WMT20/21, and four translations generated by VicunaMT, a Llama-1-based LLM finetuned for translation. Each translation is assigned a Likert score between 1-6 by two separate professional transaltors using a continuous slider interface.

Data Structure

The data is in JSONlines format (one json object per line). Each line is a dictionary with the following keys:

  1. score_annotator_a: The score assigned by the first annotator
  2. score_annotator_b: The score assigned by the second annotator
  3. score: The average score between both annotators. This is the ground truth reward (r*) used for preference in Eq. 5
  4. system: The system name. One of {sample1, sample2, sample3, beam, reference}. sample_i are distinct samples from the VicunaMT model. beam is derived via beam search.
  5. source: The source sentence
  6. translation: The translation

The training data maple-train.jsonl contains 22K lines (4 translation directions x 1.1K source sentences x 5 translations)

License

This data is released under the CC BY-NC 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/)