Author: Kiarie Ndegwa, U4742829 Course Code: COMP8715 This folder contains all the data processing code written in python 2.7. The HDF5 folder contains code that is used to generate word embeddings necessary for the seq2seq neural network. The NUCLEPre folder contains code that cleans up the NUCLE2013/2014 data into a format the seq2seq neural net can consume.
reedxiao/NUCLEprepocess
This cleans up the annoying NUCLE code into a format which can be used to make a parallel corpus of correct and incorrected essays.
Python