Create dataset loader for MLQA
Closed this issue · 0 comments
SamuelCahyawijaya commented
Dataloader name: mlqa
Data catalogue: seacrowd.github.io/seacrowd-catalogue/card.html?mlqa
Dataset | mlqa |
---|---|
Description | MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average. |
Subsets | mlqa-translate-test.vi, mlqa-translate-train.vi, mlqa.vi.ar, mlqa.vi.de, mlqa.vi.vi, mlqa.vi.zh, mlqa.vi.en, mlqa.vi.es, mlqa.vi.hi, mlqa.ar.vi, mlqa.de.vi, mlqa.zh.vi, mlqa.en.vi, mlqa.es.vi, mlqa.hi.vi |
Languages | vie |
License | Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0) |
Homepage | https://github.com/facebookresearch/MLQA |
HF URL | https://huggingface.co/datasets/mlqa |
Paper URL | https://aclanthology.org/2020.acl-main.653 |