The notebook bi-lstm-crf-tensorflow.ipynb
contains an example of a Bidirectional LSTM + CRF (Conditional Random Fields) model in Tensorflow.
I tried to keep the problem and implementation as simple as possible so anyone can understand and change the model to meet their own problem and data.
And to make it more realistic, the inputs have variable sequence lengths.
We will define a simple sequence classification problem to explore bidirectional LSTMs + CRF.
The problem is defined as a sequence of random values between 0 and 1.
A binary label (0 or 1) is associated with each input. Initially, the output values are all 0. Once the cumulative sum of the input values in the sequence exceeds a threshold, then the output value flips from 0 to 1.
A threshold of 1/4 the sequence length is used.
For example, below is a sequence of 10 input timesteps (X):
0.63144003 0.29414551 0.91587952 0.95189228 0.32195638 0.60742236 0.83895793 0.18023048 0.84762691 0.29165514
In this case the threshold is 2.5
and the corresponding classification output (y) would be:
0 0 0 1 1 1 1 1 1 1
Both bidirectional_dynamic_rnn
and crf_log_likelihood
use the optional sequence_length
parameter.
This parameter holds the real sequence lengths
of the inputs (without the padding) and, when running the model, TensorFlow will return zero vectors for states and outputs after these sequence lengths.
Therefore, weights will not get trained on the padding information.
*Obs.: The padding is necessary to use batches in Tensorflow, in order to speed up the computations.
- Tutorial this code was based on: Bi-LSTM + CRF with character embeddings for NER and POS
- Bidirectional LSTM in keras (contains the description of the problem used in this code): How to Develop a Bidirectional LSTM For Sequence Classification in Python with Keras
- Example of Bidirectional LSTM implementation in Tensorflow
- Example of CRF implementation in Tensorflow
- Explains in details the LSTM "num_units" parameter: Understanding LSTM in Tensorflow
- Explains how variable sequence lengths work in Tensorflow: Variable Sequence Lengths in TensorFlow