google-research/xtreme

Translation data

panl2015 opened this issue · 4 comments

Hi, thank you for releasing the translation data along with the benchmarks! I'm looking at the translation data for SQuAD train that you provide: https://console.cloud.google.com/storage/browser/xtreme_translations/SQuAD/translate-train/ . Based on my understanding of the paper, it should cover all the languages in MLQA and XQuAD in the translate-train setting and several are missing in this folder. I'm wondering if you plan to provide data for the rest of the languages? Thanks!

Hi, sorry about that. There was an issue during the file upload that I missed. The translation data for the SQuAD training file should now be completed. Let us know if any other files are missing.

Hi @sebastianruder @JunjieHu It looks like the translation data for training XNLI (https://console.cloud.google.com/storage/browser/xtreme_translations/XNLI/translate-train/) do not appear in the same order as the original English data in MultiNLI. Is there a way to know which English examples the translations are from? Thanks.

Hi @panl2015,

We've now updated the translation data for XNLI and PAWS-X to also contain two columns for the original input. Now you can match the translations with their original text.

Please take a look at the data and let us know if this looks good by reopening this issue.