
A minimal character level LSTM network built with Tensorflow to generate sequences of texts

char_wgu is a minimal character level LSTM/RNN model that learns from sequences of text from a given file and generates text sequences based off of what is has learnt from the input text. This project draws its inspirations from Andrej Karpathy's phenomenal post, The Unreasonable Effectiveness of Recurrent Neural Networks. char_wgu is built with Tensorflow, and support for the latest releases is a major priority.

Samples generated with char_wgu

After not-very-long training on Google Colaboratory on moderately sized sequences of texts, the model generates surprising sequences. Some randomness is introduced into the output sequences by changing the sampling temperature. See samply.py for mode details.

  • I trained a model on the text of the Constitution of Nepal 2072BS. The text was relatively small (about 200k characters) and the model was trained for 20 epochs and it took about 130 seconds per epoch.
I really like how the model decides to declare a state of emergency at {1}. The model seems to have learnt to use lists in no particular order. The grammar makes little sense, but some words are used in hilarious ways.

  • Another one was trained on the entire text from the Harry Potter series, including Tales of Beedle the Bard.
  • More examples coming pretty soon. Lemme get through my exams first.

Here's a few things that might help you.

  • Keep the learning rate to about 10e-3 during training. That is the default learning rate used in the training script. Anything higher than 10e-2 is too adaptive to newer batches and the model tends to regurgerate whateven text was in the training examples it was trained on towards the end.
  • Train on a decent machine with a supported CUDA GPU. Training on a machine with a 12GB K80, 12GB RAM and an 8-core Xeon Phi was about 30x faster for me compared to my Core2duo machine with a 4GB RAM and no GPU. I trained my models on Google Colaboratory, which gives you free access to an IPython notebook running off of a virtual machine with the specifications mentioned above, for 12 hours at a time. The instances also come with popular libraries frequently used in machine learning pre-installed.
  • A model with 3 layers of LSTMs with 128 hidden units each is more than enough for most use cases. Anything more than that is very hard to train in my experience.
  • Use decently sized text files as input. Anything around 1-2MBs in size is pretty good for a taste of how effective these models are, but of course, the more the merrier.
  The training time increases with increases in ..* number of layers of LSTM cells ..* steps in time to compute gradients ..* number of hidden units in eacdh LSTM layer


  • Unit tests. UNIT TESTS. UNIT TESTS.
  • Comment the code properly. Some blocks are too convoluted to comprehend.
  • Randomize the training examples. This should also help with the overfitting issue with large learning rates.
  • Add support for other RNN cells like GRU cells.
  • Add live loss plot within the IPython environment for Colab.

Known Issues

  • The progress bar is rendered twice during training, the first one stopping after the other one begins.
  • The progress bar overflows during sampling.
  • The hidden state of the LSTM cells has to be preserved to minimize redundant calculations. Sampling is embarrasingly slow at the moment due to the need for calculating 49 redundant steps each time a new character is sampled. That is almost shameful.