CEFR-SP provides 17k English sentences annotated with CEFR levels assigned by English-education professionals. For details of the corpus creation process and our CEFR-level assessment model, please refer to our paper.
The CEFR-SP corpus is in /CEFR-SP
directory and our codes for CEFR-level assessment model are in /src
directory.
Please refer to README of each directory for details.
Please cite the following paper if you use the above resources for your research.
Yuki Arase, Satoru Uchida, and Tomoyuki Kajiwara. 2022. CEFR-Based Sentence-Difficulty Annotation and Assessment.
in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) (Dec. 2022).
Satoru Uchida, Yuki Arase, and Tomoyuki Kajiwara. 2024. Profiling English sentences based on CEFR levels. in ITL-International Journal of Applied Linguistics, Vol. 175, No. 1, pp. 103-126 (Mar. 2024).
@inproceedings{arase-etal-2022-cefr,
title = "{CEFR}-Based Sentence Difficulty Annotation and Assessment",
author = "Arase, Yuki and
Uchida, Satoru and
Kajiwara, Tomoyuki",
editor = "Goldberg, Yoav and
Kozareva, Zornitsa and
Zhang, Yue",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.416",
doi = "10.18653/v1/2022.emnlp-main.416",
pages = "6206--6219",
}
@article{uchida-etal-2024-profiling,
title = "Profiling {English} sentences based on {CEFR} levels",
author = "Satoru Uchida and Yuki Arase and Tomoyuki Kajiwara",
editor= "David Alfter and Thomas Fran\c{c}ois",
month = mar,
year = "2024",
doi = "10.1075/itl.22018.uch",
journal = "ITL-International Journal of Applied Linguistics (Belgium)",
pages = "103--126",
publisher = "John Benjamins Publishing Company",
}
Yuki Arase (arase [at] c.titech.ac.jp) -- please replace " [at] " with an "@" symbol.