See the demo here.
This repository contains utilities for doing handwriting prediction and handwriting synthesis with recurrent neural networks. The implementation closely follows Alex Graves's paper Generating Sequences With Recurrent Neural Networks.
It provides a complete working pipeline for training models on a custom dataset and sampling from them.
It includes:
- pre-trained models
- handwriting sampling and synthesis utilities
- data preparation utilities
- training utilities
- synthesis network architecture
- prediction network architecture
- exporting models to ONNX
- Python version 3.7.0 or greater
- Open the terminal in some directory.
- Clone the repository
git clone https://github.com/X-rayLaser/pytorch-handwriting-synthesis-toolkit.git
- Change a current working directory
cd pytorch-handwriting-synthesis-toolkit
- Create a virtualenv environment using Python 3
virtualenv --python='/path/to/python3/executable' venv
On Linux you can find the path to python3 executable using this command:
which python3
- Activate the virtual environment
. venv/bin/activate
- Install dependencies
pip install -r requirements.txt
Pre-trained models for synthesis and prediction are stored, respectively, under checkpoints/ and ucheckpoints/ directories.
Create 5 handwriting samples and save them in the "samples" directory (If a directory does not exist yet, it will be created)
python synthesize.py checkpoints/Epoch_46 'A single string of text ' -b 1 --samples_dir=samples --trials=5
Optional parameter -b specifies a probability bias. Higher values result in a cleaner, nicer looking handwriting, while lower values result in less readable but more diverse samples. Please, read a corresponding section in the paper for more detail.
By default, bias equals to 0 which corresponds to unbiased sampling. Just omit the bias parameter to do so:
python synthesize.py checkpoints/Epoch_56 'A single string of text ' --samples_dir=samples
Pass parameter --show_weights to create an attention heatmap
python synthesize.py checkpoints/Epoch_52 'A single string of text ' -b 1 --samples_dir=samples --show_weights
Pass parameter --heatmap to create a heatmap of predicted mixture densities
python synthesize.py checkpoints/Epoch_52 'A string of text ' -b 0.5 --samples_dir=samples --heatmap
Generate a handwriting page for a text file
python txt2script.py checkpoints/Epoch_52 test_document.txt -b 1 --output_path 'handwriting_page.png'
python sample.py ucheckpoints/Epoch_36 usamples --trials 5 -b 0.5
In order to carry out the experiments from the paper, you will need to download the IAM On-Line Handwriting Database (or shortly, IAM-OnDB).
It is also possible to work with a custom dataset, but one would require to implement a so-called data provider class with just 2 methods. More on that is discussed in later sections.
You can obtain the IAM-OnDB dataset from here The IAM On-Line Handwriting Database (IAM-OnDB).
Download the data set and unzip it into iam_ondb_home folder. The layout of the folder should be as follows:
├── ascii-all
│ └── ascii
├── lineImages-all
│ └── lineImages
├── lineStrokes-all
│ └── lineStrokes
├── original-xml-all
│ └── original
└── original-xml-part
└── original
Extract data examples from IAM-onDB dataset, preprocess it and save it into "data" directory (make sure that iam_ondb_home folder is located at the root of the repository):
python prepare_data.py data iam 9500 0 iam_ondb_home -l 700
After running this command, you should see a new folder called "data" containing 3 files:
.
├── charset.txt
├── train.h5
└── val.h5
Start training synthesis network for 50 epoch with a batch size 32 (this might take a lot of time, even on GPU).
python train.py -b 32 -e 50 -i 300 data checkpoints
Create 1 handwriting for the string 'Text to be converted to handwriting'.
python synthesize.py checkpoints/Epoch_46 'Text to be converted to handwriting ' -b 1 --samples_dir=samples --trials=5
This section very briefly describes steps needed to train (conditional) synthesis network. For more details, see dedicated sections below.
The toolkit already comes with a built-in data preparation utility. However, it requires a so-called data provider. If you want to use the IAM-onDB dataset, no further action is necessary. Otherwise, you have to write your own to let the toolkit know how to extract the data.
Once there is a provider class, the toolkit will automatically preprocess data and save it. The preprocessing mainly involves the following steps:
- flattening every raw handwriting into a list of 3-element tuples
(containing x and y coordinates as well as End-Of-Stroke flag) - replacing every coordinate with its offset from the previous one
- truncating the sequences longer than a specified threshold
The steps above apply only to the handwriting portion of the data. Transcripts remain unchanged.
Data preparation is done by running the command prepare_data.py. Executing the script will create two files in HDF5 format, 1 for training and 1 for validation examples. Along the way, the script also computes the mean and standard deviation and extracts all unique characters from transcripts.
The command expects at least two arguments: a path to a directory that will store prepared data and a name of a data provider. The name must match the data provider's name attributes (for example, iam for IAMonDBProvider class). The data provider class might have init method that takes arguments (as is the case with iam provider). If that's the case, you need to pass them in when calling the script.
An optional parameter --max_len sets the maximum length of handwriting. Any handwriting longer than max_len is going to be truncated to max_len points.
For more details on the command usage, see the Commands section.
The data provider is a class with a class attribute "name" and two methods: get_training_data and get_validation_data.
Methods have to return an iterable or generator of pairs of examples in a certain format. Other than that, these methods can contain any logic whatsoever.
Every example needs to be a tuple of size 2. The second element is a corresponding transcript as a Python string. The first element of the tuple stores handwriting represented as a list of strokes. A stroke is yet another list of tuples (x, y), where x and y are coordinates recorded by a pen moving on the screen surface. Here is an example of handwriting consisting of 3 strokes:
[
[(1, 2), (1, 3), (2, 5)],
[(10, 3), (15, 4), (18, 8)],
[(22, 10), (20, 5)]
]
Let's create a dummy data provider that returns only one training and validation example. First, open a python module handwriting_synthesis.data_providers.custom.py. Implement a new class named DummyProvider in that module.
class DummyProvider(Provider):
name = 'dummy'
def get_training_data(self):
handwriting = [
[(1, 2), (1, 3), (2, 5)],
[(10, 3), (15, 4), (18, 8)],
[(22, 10), (20, 5)]
]
transcript = 'Hi'
yield handwriting, transcript
def get_validation_data(self):
handwriting = [
[(1, 2), (1, 3), (2, 5)],
[(10, 3), (15, 4), (18, 8)],
[(22, 10), (20, 5)]
]
transcript = 'Hi'
yield handwriting, transcript
Here we yield the same data from both methods. If we wanted, we could fetch some data from a file or even download them via a network. It does not matter how the data gets retrieved. The only thing that matters is what the data provider returns or yields.
Note that name attribute we added to the class. We can now use it to tell the preparation script to use the data provider class with that name. It's time to test the provider:
python prepare_data.py temp_data dummy
You should see a directory temp_data containing HDF5 files and one text file with the text "Hi".
After preparing the data, running "train.py" script will start or resume training a network. You can train either handwriting prediction network (which can be used for unconditional sampling), or a synthesis network (text->handwriting network).
Network will be trained on GPU If CUDA device is available, otherwise CPU will be used.
After every epoch, network weights will be saved in a specified location. In case of interruption of the script, the script will load the model weights and continue training.
Finally, after specified number iterations, the script will sample a few hand writings from a network and save them in a folder.
By default, the script will use character set stored under <prepared_data_folder>/charset.txt. This file contains all characters that were found by scanning the text part of the dataset. This might include digits, punctuation as well as other non-letter characters. You might want to restrict the set to contain only white-space and letters. Just create a new text file and populate it with characters that you want your synthesis network to be able to produce.
$ python prepare_data.py --help
usage: prepare_data.py [-h] [-l MAX_LEN]
save_dir provider_name
[provider_args [provider_args ...]]
Extracts (optionally splits), preprocesses and saves data in specified
destination folder.
positional arguments:
save_dir Directory to save training and validation datasets
provider_name A short name used to lookup the corresponding factory
class
provider_args Variable number of arguments expected by a provider
__init__ method
optional arguments:
-h, --help show this help message and exit
-l MAX_LEN, --max_len MAX_LEN
Truncate sequences to be at most max_len long. No
truncation is applied by default
$ python train.py --help
usage: train.py [-h] [-u] [-b BATCH_SIZE] [-e EPOCHS] [-i INTERVAL]
[-c CHARSET] [--samples_dir SAMPLES_DIR] [--clip1 CLIP1]
[--clip2 CLIP2]
data_dir model_dir
Starts/resumes training prediction or synthesis network.
positional arguments:
data_dir Directory containing training and validation data h5
files
model_dir Directory storing model weights
optional arguments:
-h, --help show this help message and exit
-u, --unconditional Whether or not to train synthesis network (synthesis
network is trained by default)
-b BATCH_SIZE, --batch_size BATCH_SIZE
Batch size
-e EPOCHS, --epochs EPOCHS
# of epochs to train
-i INTERVAL, --interval INTERVAL
Iterations between sampling
-c CHARSET, --charset CHARSET
Path to the charset file
--samples_dir SAMPLES_DIR
Path to the directory that will store samples
--clip1 CLIP1 Gradient clipping value for output layer. When omitted
or set to zero, no clipping is done.
--clip2 CLIP2 Gradient clipping value for lstm layers. When omitted
or set to zero, no clipping is done.
$ python evaluate.py --help
usage: evaluate.py [-h] [-u] data_dir path
Computes a loss and other metrics of trained network on validation set.
positional arguments:
data_dir Path to prepared dataset directory
path Path to a saved model
optional arguments:
-h, --help show this help message and exit
-u, --unconditional Whether or not the model is unconditional (assumes
conditional model by default)
$ python sample.py --help
usage: sample.py [-h] [-b BIAS] [-s STEPS] [-t TRIALS] [--thickness THICKNESS]
path sample_dir
Generates (unconditionally) samples from a pretrained prediction network.
positional arguments:
path Path to saved model
sample_dir Path to directory that will contain generated samples
optional arguments:
-h, --help show this help message and exit
-b BIAS, --bias BIAS A probability bias. Unbiased sampling is performed by
default.
-s STEPS, --steps STEPS
Number of points in generated sequence
-t TRIALS, --trials TRIALS
Number of attempts
--thickness THICKNESS
Handwriting thickness in pixels. It is set to 10 by
default.
$ python synthesize.py --help
usage: synthesize.py [-h] [-b BIAS] [--trials TRIALS] [--show_weights]
[--heatmap] [--samples_dir SAMPLES_DIR]
[--thickness THICKNESS]
model_path text
Converts a single line of text into a handwriting with a randomly chosen style
positional arguments:
model_path Path to saved model
text Text to be converted to handwriting
optional arguments:
-h, --help show this help message and exit
-b BIAS, --bias BIAS A probability bias. Unbiased sampling is performed by
default.
--trials TRIALS Number of attempts
--show_weights When set, will produce a plot: handwriting against
attention weights
--heatmap When set, will produce a heatmap for mixture density
outputs
--samples_dir SAMPLES_DIR
Path to the directory that will store samples
--thickness THICKNESS
Handwriting thickness in pixels. It is set to 10 by
default.
$ python txt2script.py --help
usage: txt2script.py [-h] [-b BIAS] [--output_path OUTPUT_PATH]
[--thickness THICKNESS]
model_path input_path
Converts a text file into a handwriting page.
positional arguments:
model_path Path to saved model
input_path A path to a text file that needs to be converted to a
handwriting
optional arguments:
-h, --help show this help message and exit
-b BIAS, --bias BIAS A probability bias. Unbiased sampling is performed by
default.
--output_path OUTPUT_PATH
Path to the generated handwriting file (by default, it
will be saved to the current working directory whose
name will be input_path with trailing .png extension)
--thickness THICKNESS
Handwriting thickness in pixels. It is set to 10 by
default.
[1] Alex Graves. Generating Sequences With Recurrent Neural Networks
If you find this repository useful, consider starring it by clicking at the ★ button. It would be much appreciated.