
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension




This repository maintains C3, the first free-form multiple-Choice Chinese machine reading Comprehension dataset.

  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
  journal={Transactions of the Association for Computational Linguistics},

Files in this repository:

  • license.txt: the license of C3.
  • data/c3-{m,d}-{train,dev,test}.json: the dataset files, where m and d represent "mixed-genre" and "dialogue", respectively. The data format is as follows.
      document 1
        "question": document 1 / question 1,
        "choice": [
          document 1 / question 1 / answer option 1,
          document 1 / question 1 / answer option 2,
        "answer": document 1 / question 1 / correct answer option
        "question": document 1 / question 2,
        "choice": [
          document 1 / question 2 / answer option 1,
          document 1 / question 2 / answer option 2,
        "answer": document 1 / question 2 / correct answer option
    document 1 / id
      document 2
        "question": document 2 / question 1,
        "choice": [
          document 2 / question 1 / answer option 1,
          document 2 / question 1 / answer option 2,
        "answer": document 2 / question 1 / correct answer option
        "question": document 2 / question 2,
        "choice": [
          document 2 / question 2 / answer option 1,
          document 2 / question 2 / answer option 2,
        "answer": document 2 / question 2 / correct answer option
    document 2 / id
  • annotation/c3-{m,d}-{dev,test}.txt: question type annotations. Each file contains 150 annotated instances. We adopt the following abbreviations:
Abbreviation Question Type
Matching m Matching
Prior knowledge l Linguistic
s Domain-specific
c-a Arithmetic
c-o Connotation
c-e Cause-effect
c-i Implication
c-p Part-whole
c-d Precondition
c-h Scenario
c-n Other
Supporting Sentences 0 Single Sentence
1 Multiple sentences
2 Independent