/EusQuAD

EusQuAD - Automatically Translated and Aligned SQuAD2.0 for Basque

EusQuAD - Automatically Translated and Aligned SQuAD2.0 for Basque

EuSQuAD, is a version of SQuAD2.0 for Basque. Our approach is based on machine-translating the original corpus with a generic neural machine translation system, addressing mismatches between context and answers via semantic text similarity. The resulting dataset is of the same size as the original SQuAD2.0 dataset (over 142k question-answer pairs), readily usable for QA-related tasks in Basque.

Format and usage

EuSQuAD has the same json format and structure than the original SQuAD2.0, so it should be possible to use the same code and tools to load and use it.

EusQuaD can be requested from: http://link-to-download-form-or-whatever (TO BE UPDATED)

Authors

The following researchers have collaborated in the EuSQuaD dataset creation process:

  • Aitor García-Pablos
  • Naiara Perez
  • Montse Cuadros

Also, credit and thanks are due to the Machine Translation team from Vicomtech's HSLT department for providing the English-Basque translation service.

Contact

(TO BE UPDATED)

License

The same as the original SQuAD2.0

CC BY-SA 4.0

Other relevant information

If you use this dataset, please, cite the following paper:

(INCLUDE SEPLN PAPER REFERENCE WHEN/IF PUBLISHED)