/lang-8-process

Lang-8 preprocessing scripts

Primary LanguagePython

Lang-8 Preprocessing

This repo contains preprocessing scripts for extracting English correction corpus from Lang-8 Learner Corpora (https://sites.google.com/site/naistlang8corpora/). Please use Python 3 and install the following dependencies:

pip install joblib langid nltk tqdm