Create dataset loader for MLQA

Question

Closed this issue a year ago · 0 comments

Dataloader name: mlqa
Data catalogue: seacrowd.github.io/seacrowd-catalogue/card.html?mlqa

Dataset	mlqa
Description	MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.
Subsets	mlqa-translate-test.vi, mlqa-translate-train.vi, mlqa.vi.ar, mlqa.vi.de, mlqa.vi.vi, mlqa.vi.zh, mlqa.vi.en, mlqa.vi.es, mlqa.vi.hi, mlqa.ar.vi, mlqa.de.vi, mlqa.zh.vi, mlqa.en.vi, mlqa.es.vi, mlqa.hi.vi
Languages	vie
License	Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage	https://github.com/facebookresearch/MLQA
HF URL	https://huggingface.co/datasets/mlqa
Paper URL	https://aclanthology.org/2020.acl-main.653