SEACrowd/.github

Create dataset loader for MLQA

Closed this issue · 0 comments

Dataloader name: mlqa
Data catalogue: seacrowd.github.io/seacrowd-catalogue/card.html?mlqa

Dataset mlqa
Description MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.
Subsets mlqa-translate-test.vi, mlqa-translate-train.vi, mlqa.vi.ar, mlqa.vi.de, mlqa.vi.vi, mlqa.vi.zh, mlqa.vi.en, mlqa.vi.es, mlqa.vi.hi, mlqa.ar.vi, mlqa.de.vi, mlqa.zh.vi, mlqa.en.vi, mlqa.es.vi, mlqa.hi.vi
Languages vie
License Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage https://github.com/facebookresearch/MLQA
HF URL https://huggingface.co/datasets/mlqa
Paper URL https://aclanthology.org/2020.acl-main.653