This project explores Natural Language Processing (NLP) techniques for feature extraction in ESL/TOEFL specifically Error Identification questions. By identifying these features, the project aims to improve the accuracy of competency classification.
- Prerequisites:
- Python 3.10.13
- Spacy
- en_core_web_sm (You can use the md or lg model if you like)
- TOEFL ITP Error Identification type questions (csv format)
- Installation: Use virtual env to install dependencies from the requirements.txt
- Basic Usage: Run the main.py followed by -f or --filename to specify the filename of the error identification question data.
- Examples:
python main.py -f <filename>
<filename>
: should be the filename of the Error Identification question data ondata/<filename>.csv
path-f
--filename
: Flag for input filename indata/
folder with .csv format
- Output:
<filename>_features_v2.json
located indata/
folder
- This project is actively exploring Natural Language Processing (NLP) techniques for feature extraction in ESL/TOEFL questions, with a focus on Error Identification. By identifying these features, the project aims to improve the accuracy of competency classification in ESL/TOEFL assessments. Additionally, it serves as a learning platform for delving deeper into the capabilities of NLP.