The goal of the project is to develop a recurrent neural network (RNN) capable of generating text.
The data source used to train the RNN is "A Tale of Two Cities" by Charles Dickens. This was obtained from the Gutenberg Project, here.
Some 'heading' data was removed and is saved as a text file as part of this repo.
We reference the coding examples to build our RNN.
-
Creating A Text Generator Using Recurrent Neural Network, by Trung Tran
-
The Unreasonable Effectiveness of Recurrent Neural Networks, by Andrej Karpathy
The example provided by Karpathy showed interesting examples of how character-by-character trained and generation allowed for varied examples including generation of C code, Shakespeare etc.
We observed the RNN model 'learning' various aspects of the text - from repetition of various words in the earlier epochss, to the understanding of punctuation and direct speech in later epochs. A possible next step is to see whether RNN is able to reproduce word associations after sufficient training epochs (ie. word2vec).