This is a list of papers and dataset URLs cited by our paper "Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends".



Introduction & Taxonomy

The URL of the datasets we mentioned are listed here.


Textbook QA

Dataset URL
TQA http://textbookqa.org
Geometry3K https://lupantech.github.io/inter-gps/
AI2D https://github.com/allenai/dqa-net
ScienceQA https://scienceqa.github.io/
MedQA https://github.com/jind11/MedQA
MedMCQA https://medmcqa.github.io
TheoremQA https://github.com/wenhuchen/TheoremQA


Dataset URL
Dolphin-18K http://research.microsoft.com/en-us/projects/dolphin/
DRAW-1K https://www.microsoft.com/en-us/research/publication/annotating-derivations-a-new-evaluation-strategy-and-dataset-for-algebra-word-problems/
Math23K https://ai.tencent.com/ailab/nlp/dialogue/#datasets
MathQA https://math-qa.github.io/math-QA/
ASDiv https://github.com/chaochun/nlu-asdiv-dataset
GSM8K https://github.com/openai/grade-school-math
IconQA https://iconqa.github.io/
TABMWP https://promptpg.github.io/


Dataset URL
SciQ https://allenai.org/data/sciq
RACE https://www.cs.cmu.edu/~glai1/data/race/
FairytaleQA https://github.com/uci-soe/FairytaleQAData
LearningQ https://dataverse.mpi-sws.org/dataverse/icwsm18
KHANQ https://github.com/Huanli-Gong/KhanQ
EduQG https://github.com/hadifar/question-generation
MCQL https://github.com/harrylclc/LTR-DG
Televic https://github.com/semerekiros/dist-retrieval



Dataset URL
CLC-FCE http://www.ilexir.com/
ASAP https://www.kaggle.com/c/asap-aes
TOEFL 11 https://catalog.ldc.upenn.edu/LDC2014T06
HSK http://yuyanziyuan.blcu.edu.cn/en/info/1043/1501.htm



Dataset URL
LANG-8 https://sites.google.com/site/naistlang8corpora/home
CLANG-8 https://github.com/google-research-datasets/clang8
BEA-2019 https://www.cl.cam.ac.uk/research/nl/bea2019st/
CTC https://destwang.github.io/CTC2021-explorer/
FCGEC https://github.com/xlxwalex/FCGEC
FlaCGEC https://github.com/hyDududu/FlaCGEC
GECCC https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-4639
RULEC-GEC https://github.com/arozovskaya/RULEC-GEC
Falko-MERLIN https://github.com/adrianeboyd/boyd-wnut2018
COWS-L2H https://github.com/ucdaviscl/cowsl2h
UA-GEC https://github.com/grammarly/ua-gec
RONACC https://github.com/TeodorMihai/RoGEC


Dataset URL
Defects4J https://github.com/rjust/defects4j
ManyBugs https://repairbenchmarks.cs.umass.edu/
IntroClass https://repairbenchmarks.cs.umass.edu/
QuixBugs https://github.com/jkoppel/QuixBugs
CodeReview https://github.com/microsoft/CodeBERT/tree/master/CodeReviewer


  title={Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends},
  author={Lan, Yunshi and Li, Xinyuan and Du, Hanyue and Lu, Xuesong and Gao, Ming and Qian, Weining and Zhou, Aoying},
  journal={arXiv preprint arXiv:2401.07518},