/NUCLEprepocess

This cleans up the annoying NUCLE code into a format which can be used to make a parallel corpus of correct and incorrected essays.

Primary LanguagePython

Author: Kiarie Ndegwa, U4742829
Course Code: COMP8715

This folder contains all the data processing code written in python 2.7.

The HDF5 folder contains code that is used to generate word embeddings 
necessary for the seq2seq neural network.

The NUCLEPre folder contains code that cleans up the NUCLE2013/2014
data into a format the seq2seq neural net can consume.