Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
Reference page The Unreasonable Effectiveness of Recurrent Neural Networks
Actually this model use the simple RNN, not using LSTM. This model use the characters as input, then we use a one-hot vector as input X, the dimension of X is the size of characters in input file.
Let assume that the char vocab size is V, and hidden size is H, then output layer size is V.
This simple RNN model contain 3 matrices:
- Whh: H * H, hidden layer to hidden layer
- Wxh: V * H, input layer to hidden layer
- Why: V * H, hidden layer to output layer
In the output layer, softmax is used to compute the character probability distribution, then we could sample the next character according previous input.
update hidden state
ht = tanh(Whh * ht-1 + Wxh*X)
compute output vector
y = Why * h
This model use the characters in input.txt file, use current character and next character as a training data pair. Each character is represented by one hot vector, target value means which character we expected given current character.
For example: in input file, we read in "hello", then 'h' and 'e' will be used as a training pair, and will be encoded into vectors. Let's assume that we only have 4 characters in our vocab, ('h','e','l','o'), then 'h' will be encoded to [1,0,0,0]T, 'e' will be encoded to [0,1,0,0]T
Input layer size is V, then input value is a V * 1 one hot vector.
Hidden Layer
Hidden layer size is H, we also need to record the hidden state(value of hidden layer).
Output layer size is V, we get a character probability distribution in output layer, then we could sample a character in this probability distribution given a sequence of input.
This RNN model use cross entropy as cost function (error), cross entropy for one training data:
Because in t, most of the value is 0, so, we could rewrite the above equation as:
Here i is the indice of 1 in vector t.
- Install numpy
pip install numpy
- Run this code
python min-char-rnn.py
The output of this model is sampled characters given current input characters. Examples out output:
iter 92400, loss: 48.542305
---- sample -----
----
cet for dons of he wast oune tofus shee loolf the was hering thity,
Youpres
To make his it you gain fell must out you yie t.
Wert; your geang'p his you cageiingeal my madm; -hat fould the conquall to
----