annotations_creators | language_creators | languages | licenses | multilinguality | size_categories | source_datasets | task_categories | task_ids | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
- Dataset Description
- Dataset Structure
- Dataset Creation
- Considerations for Using the Data
- Additional Information
- Homepage: http://github.com/pythainlp/thaiqa_squad (original
thaiqa
at https://aiforthai.in.th/) - Repository: http://github.com/pythainlp/thaiqa_squad
- Paper:
- Leaderboard:
- **Point of Contact:**http://github.com/pythainlp/ (original
thaiqa
at https://aiforthai.in.th/)
thaiqa_squad
is an open-domain, extractive question answering dataset (4,000 questions in train
and 74 questions in dev
) in SQuAD format, originally created by NECTEC from Wikipedia articles and adapted to SQuAD format by PyThaiNLP.
extractive question answering
Thai
[More Information Needed]
[More Information Needed]
train | valid | |
---|---|---|
# questions | 4000 | 74 |
# avg words in context | 1186.740750 | 1016.459459 |
# avg words in question | 14.325500 | 12.743243 |
# avg words in answer | 3.279750 | 4.608108 |
PyThaiNLP created thaiqa_squad
as a SQuAD version of thaiqa. thaiqa is part of The 2nd Question answering program from Thai Wikipedia of National Software Contest 2020.
[More Information Needed]
Wikipedia authors for contexts and NECTEC for questions and answer annotations
[More Information Needed]
All contents are from Wikipedia. No personal and sensitive information is expected to be included.
- open-domain, extractive question answering in Thai
[More Information Needed]
- The contexts include
<doc>
tags at start and at the end
NECTEC for original thaiqa. SQuAD formattting by PyThaiNLP.
CC-BY-NC-SA 3.0
[More Information Needed]