/ml_textgen

Machine Learning Project - Text Generation using RNN

Primary LanguageJupyter Notebook

Text Generation using Recurrent Neural Networks (RNN)

Goals

The goal of the project is to develop a recurrent neural network (RNN) capable of generating text.

Dataset

The data source used to train the RNN is "A Tale of Two Cities" by Charles Dickens. This was obtained from the Gutenberg Project, here.

Some 'heading' data was removed and is saved as a text file as part of this repo.

References

We reference the coding examples to build our RNN.

The example provided by Karpathy showed interesting examples of how character-by-character trained and generation allowed for varied examples including generation of C code, Shakespeare etc.

Results

We observed the RNN model 'learning' various aspects of the text - from repetition of various words in the earlier epochss, to the understanding of punctuation and direct speech in later epochs. A possible next step is to see whether RNN is able to reproduce word associations after sufficient training epochs (ie. word2vec).