Handwriting line generation

Handwriting... with style!

This is the code for the paper "Text and Style Conditioned GAN for the Generation of Offline-Handwriting Lines" published at BMVC 2020. https://arxiv.org/abs/2009.00678

This was originally Brian Davis's summer 2019 internship project at Adobe (https://github.com/adobe-research/hw_with_style). It was then extended afterwards (while at BYU) and finally published.

The trained models (snapshots) are available as a file in the release (https://github.com/herobd/handwriting_line_generation/releases/tag/w1.0).

Code structure based on victoresque pytorch template.

Requirements

Python 3.x
PyTorch >1.0
torchvision
opencv
scikit-image
editdistance

Reproducability instructions

In the configs directory are several jsons which have the parameters used for the paper. The "data_loader": "data_dir" needs set to the location of the dataset directory. You can also adjust the GPU options here.

First the handwriting reconition model and feature encoder networks need trained.

HWR: python train.py -c configs/cf_IAM_hwr_cnnOnly_batchnorm_aug.json

Encoder: python train.py -c configs/cf_IAM_auto_2tight_newCTC.json

Then the generator can be trained: python train.py -c configs/cf_IAMslant_noMask_charSpecSingleAppend_GANMedMT_autoAEMoPrcp2tightNewCTCUseGen_balB_hCF0.75_sMG.json

The RIMES dataset will use:

HWR: python train.py -c configs/cf_RIMESLines_hwr_cnnOnly_batchnorm_aug.json

GAN: python train.py -c configs/cf_RIMESLinesslant_noMask_charSpecSingleAppend_GANMedMT_autoAEMoPrcp2tightNewCTCUseGen_balB_hCF0.75_sMG.json

RIMES resuses the IAM encoder.

Figures for the paper were made using generate.py

Generating images using generate.py

Usage: `python generate.py -c path/to/snapshot.pth -d output_directory -g #[optional gpu flag] -s style_pickle_file[optional] -T[optional, use test set]

You can also use -h

It has several "modes" to select from once it's loaded the model. Use h to display them. Most will ask for additional input (desired text, number of styles, etc). The output is saved to the supplied -d location.

Some modes need the datasets styles in a pickle. I've included the IAM/RIME test sets with the trained snapshots. Use get_styles.py to extract any additional ones. Some modes just want example images.

python get_styles.py -c path/to/snapshot.pth -d output_directory -g #[optional gpu flag] -T[optional, do test set, otherwise does trian and valid]

Data

The RIMES training uses a text file of French (french_news.txt). This really can be any medium (maybe even small) French text corpus. I used the one found at https://webhose.io/free-datasets/french-news-articles/ To compile it's jsons into the .txt file, use the script found at data/compile_french_new.py. First create a file listing all the jsons (ls > jsons.txt) then run the script.

I don't remember where I got the english text corpus (at least part of it is the OANC https://www.anc.org/data/oanc/download/ dataset), but really any one should do. It just expects a txt file with just text.

Folder Structure


│
├── train.py - Use this to train
├── new_eval.py - Use this to evaluate and save images. Some examples of how to run this in notes.txt. This was used to generate the reconstruction images for the paper.
├── get_styles.py - This uses a trained model to extract style vectors from a dataset and save them.
├── generate.py - This is an interactive script to generate images using a trained model, including interpolations. Figures for the paper were generally created using this.
├── umap_styles.py - This generates the umap plots used in the paper
├── graph.py - Display plots given a training snapshot
├── old_generate.py - This will generate images given a json having lists of style images, text lines, and output paths. (I don't know if this works still)
├── eval_writer_id.py - was intended to evaluate writer identification performance given the style vectors, but I don't know if I ever got it working correctly.
│
├── base/ - abstract base classes
│   ├── base_data_loader.py - abstract base class for data loaders (unused?)
│   ├── base_model.py - abstract base class for models
│   └── base_trainer.py - abstract base class for trainers
│
├── data/ - this has various files that were convenient to keep with the project
├── data_loader/ 
│   └── data_loaders.py - This just gets you the right dataset
│
├── datasets/ - default datasets folder
│   ├── hw_dataset.py - basic dataset made to handle IAM data at the line level
│   ├── author_hw_dataset.py - This sorts instances by author and has a 'a_batch_size', which is how many by each author should be in the batch (batch_size is number of authors)
│   ├── author_word_dataset.py - Same as above only at word level instead of line level
│   ├── (the "mixed_author" datasets have different splits which mix authors over the splits)
│   ├── author_rimes_dataset.py - Same as author_hw_dataset.py, but for RIMES words
│   ├── author_rimeslines_dataset.py - Same as author_hw_dataset.py, but for RIMES lines
│   ├── synth_hw_dataset.py - For training a HWR model using the generated images
│   ├── create_synth_text_dataset.py - Generate images for synth_hw_dataset.py (although it can generate them on the fly)
│   └── test*.py - This are scripts to run through a dataset and simply display what's being returned. For debugging purposes.
│
├── logger/ - for training process logging
│   └── logger.py
│
├── model/ - models, losses, and metrics
│   ├── loss.py - has all losses, here or imported here
│   ├── MUNIT_networks.py - has code from MUNIT paper as well as my own generator classes
│   ├── aligned_l1_loss.py - l1 loss after aligning input and target  by sliding them along eachother
│   ├── attention.py - functions for multi-head dot product attention
│   ├── author_classifier.py - auxiliary patch-based model to classify author id
│   ├── char_cond_discriminator_ap.py - discriminator with seperate classifying heads for each character, accepts character specific style vectors
│   ├── char_gen.py - generator model which accepts character specific style vectors
│   ├── cnn_lstm.py - Code from Start, Follow, Read, with minor tweaks to allow Group Norm and logsoftmax at the end
│   ├── cnn_lstm_skip.py - above model, but has skip conenction past LSTM layers
│   ├── cnn_only_hwr.py - Alternate network to do HWR without recourrent layers. Doesn't work too well.
│   ├── google_hwr.py - Non-RNN based HWR based on "A Scalable Handwritten Text Recognition System"
│   ├── deform_hwr.py - Code to allow ElasticDeform and DeformableConvs in HWR network without style (for baseline)
│   ├── discriminator.py - Has various discriminators I tried, all using spectral norm
│   ├── cond_discriminator_ap.py - Discriminator used in paper. Has many options to make it conditioned on style, text
│   ├── cond_discriminator.py - Older version, less options, doesn't use average pooling
│   ├── dtw_loss.py - l1 loss after aligning input and target using dynamic programming. Didn't have success with this.
│   ├── elastic_layer.py - Chris's code along with my modification to allow style appending
│   ├── grcl.py - Gated Recurrent Convolution Layer
│   │
│   ├── hw_with_style.py - Primary model which contains all submodels.
│   │
│   ├── join_net.py - Part of Curtis's code for training an aligning network. Never got this working
│   ├── key_loss.py - Defines the loss to cause keys to be alteast some distance from their closest neighbor
│   ├── lookup_style.py - intended to create a learned style vector for each author, rather than extracting them from example images
│   ├── mask_rnn.py - Both spacing and mask generation networks, RNN and CNN based
│   ├── net_builder.py - various auxilary network construction functions from previous project. I think I only use "getGroupSize()" to use with Group Norm
│   ├── pretrained_gen.py - generator model I experminetned with where it was pretrained as the decoder of an autoencoder
│   ├── pure_gen.py - contains model from paper (PureGen) as well as several variations I experimented with
│   ├── simple_gan.py - None-StyleGAN based generator, and a simpler discriminator
│   ├── pyramid_l1_loss.py - exactly what it says
│   ├── style.py - old style extractor network
│   ├── char_style.py - style extractor from paper, that uses character specific heads
│   ├── vae_style.py - style extractor which predicts two parts for using with VAE
│   └── style_hwr.py - Various models of HWR that use a style vector.
│
├── saved/ - default checkpoints folder
│
├── configs/ - configuration files for reproducibility
├── old_configs/ - all the other config files I've used during development/other projects
│
├── trainer/ - trainers
│   ├── hw_with_style_trainer.py - This has the code to run training (there are two methods for training, depending on whether a curriculum is specified)
│   └── trainer.py
│
└── utils/
    ├── util.py - importantly has code to create mask from handwriting image and extact centerline from handwriting image
    ├── augmentation.py - Chris's brightness augmentation
    ├── curriculum.py - this object handles tracking the curriculum during training
    ├── character_set.py - Gets the character set from label files (modfied from Start,Follow,Read code)
    ├── error_rates.py - character error, etc
    ├── grid_distortion.py - Curtis's augmentation
    ├── metainit.py - I tried getting this paper to work: "MetaInit: Initialzing learning by learning to initialize"
    ├── normalize_line.py - functions to noramlize a line image
    ├── parseIAM.py - parses the xmls IAM has
    ├── parseRIMESlines.py - parse the GT for RIMES into line images
    ├── string_utils.py - used for converting string characters to their class numbers and back
    └── util.py - various functions

Using old_generate.py

I'm not sure if this still works...

Usage: python old_generate.py -c path/to/snapshot.pth -l list_file.json

The list file has the following format:

It is a list of "jobs. Each job looks like

{
    "style": ["path/image1", "path/image1",...],
    "text": ["generate this text", "and this text", ...],
    "out": ["path/save/this", "path/save/that", ...]
}

All of the style images are appended together to extrac a single style and it is used to generate the given texts.

You can do a single image using the flags -s "path/image1 path/image2" -t "generate this text" -o path/save/this.

If you want to interpolate, add "interpolate:0.1" to the object. The float is the interpolation step size. It must have only one text (and out path) and each style image will have a style extracted from it individually. The interpolation will be between each of the images' styles (and back to the first one).

Config file format

Config files are in .json format.

{
  "name": "long_Gskip",                                       #name for saving
  "cuda": true,                                               #use GPUs
  "gpu": 0,                                                   #only single GPU supported
  "save_mode": "state_dict",                                  #can change to save whole model
  "override": true,                                           #if resuming, replace config file
  "super_computer":false,                                     #whether to mute log output
  "data_loader": {
      "data_set_name": "AuthorHWDataset",                     #class name

      "data_dir": "/trainman-mount/trainman-storage-8308c0a4-7f25-47ad-ae22-1de9e3faf4ad",    #IAM loaction on sensei
      "Xdata_dir": "../data/IAM/",
      "batch_size": 1,
      "a_batch_size": 2,
      "shuffle": true,
      "num_workers": 2,

      "img_height": 64,
      "max_width": 1400,
      "char_file": "./data/IAM_char_set.json",
      "mask_post": ["thresh","dilateCircle","errodeCircle"],
      "mask_random": false,
      "spaced_loc": "../saved/spaced/spaced.pkl"
  },
  "validation": {
      "shuffle": false,
      "batch_size": 3,
      "a_batch_size": 2,
      "spaced_loc": "../saved/spaced/val_spaced.pkl"
  },

  
  "lr_scheduler_type": "none",

  "optimizer_type": "Adam",
  "optimizer": {
      "lr": 0.00001,
      "weight_decay": 0,
      "betas": [0.5,0.99]
  },
  "optimizer_type_discriminator": "Adam",                 #seperate optimizer for discriminators
  "optimizer_discriminator": {
      "lr": 0.00001,
      "weight_decay": 0,
      "betas": [0,0.9]
  },
  "loss": {                                               #adversarial losses (generative and discriminator) are hard coded and not specified here
      "auto": "pyramidL1Loss",
      "key": "pushMinDist",
      "count": "MSELoss",
      "mask": "HingeLoss",
      "feature": "L1Loss",
      "reconRecog": "CTCLoss",
      "genRecog": "CTCLoss",
      "genAutoStyle": "L1Loss"
  },
  "loss_weights": {                                       #multiplied to loss to balance them
      "auto": 1,
      "discriminator": 0.1,
      "generator": 0.01,
      "key": 0.001,
      "count": 0.1,
      "mask": 0.1,
      "mask_generator": 0.01,
      "mask_discriminator": 0.01,
      "feature": 0.0000001,
      "reconRecog": 0.000001,
      "genRecog": 0.0001,
      "style_discriminator": 0.1,
      "style_generator": 0.01,
      "genAutoStyle": 0.1

  },
  "loss_params":                                          #additional params passed directly to function
      {
          "auto": {"weights":[0.4,0.3,0.3],
                   "pool": "avg"},
          "key": {"dist":"l1",
                  "thresh": 1.0},
          "mask": {"threshold": 4}
      },
  "metrics": [],                                          #unused
  "trainer": {
      "class": "HWWithStyleTrainer",
      "iterations": 700000,                               #Everything is iterations, because epochs are weird.
      "save_dir": "../saved/",
      "val_step": 2000,                                   #how frequently to run through validation set
      "save_step": 5000,                                  #how frequently to save a seperate snapshot of the training & model
      "save_step_minor": 250,                             #how frequently to save a "latest" model (overwrites)
      "log_step": 100,                                    #how frequently to print training stats
      "verbosity": 1,
      "monitor": "loss",
      "monitor_mode": "none",
      "space_input": true,
      "style_together": true,                             #append sty
      "use_hwr_pred_for_style": true,
      "hwr_without_style":true,
      "slow_param_names": ["keys"],
      "curriculum": {
          "0": [["auto"],["auto-disc"]],
          "1000": [["auto", "auto-gen"],["auto-disc"]],
          "80000": [["count","mask","gt_spaced","mask-gen"],["auto-disc","mask-disc"]],
          "100000": [  [1,"count","mask","gt_spaced","mask-gen"],
                      ["auto","auto-gen","count","mask","gt_spaced","mask_gen"],
                      ["auto","auto-mask","auto-gen","count","mask","gt_spaced","mask_gen"],
                      [2,"gen","gen-auto-style"],
                      [2,"disc","mask-disc"],
                      [2,"auto-disc","mask-disc"]]
      },
      "balance_loss": true,
      "interpolate_gen_styles": "extra-0.25",

  "text_data": "data/lotr.txt",

      "use_learning_schedule": false
  },
  "arch": "HWWithStyle", 
  "model": {
      "num_class": 80,
      "generator": "SpacedWithMask",
      "gen_dim": 128,
      "gen_n_res1": 2,
      "gen_n_res2": 3,
      "gen_n_res3": 2,
      "gen_use_skips": true,
  "hwr": "CRNN_group_norm_softmax",
      "pretrained_hwr": "../saved/IAM_hwr_softmax_aug/checkpoint-latest.pth",
      "hwr_frozen": true,
      "style": "new",
      "style_norm":"group",
      "style_activ":"relu",
      "style_dim": 256,
      "num_keys": 64,
      "global_pool": true,
      "discriminator": "two-scale-better more low global",
      "spacer": "duplicates",
      "create_mask": true,
      "mask_discriminator": "with derivitive"

  }
}

Using config files

Modify the configurations in .json config files, then run:

python train.py --config config.json

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python train.py --resume path/to/checkpoint

The checkpoints will be saved in save_dir/name/.

The config file is saved in the same folder. (as a reference only, the config is loaded from the checkpoint)

Note: checkpoints contain:

{
  'arch': arch,
  'epoch': epoch,
  'logger': self.train_logger,
  'state_dict': self.model.state_dict(),
  'optimizer': self.optimizer.state_dict(),
  'monitor_best': self.monitor_best,
  'config': self.config
}

UMAPing

Save styles using get_styles.py Then umap_styles.py styles.pkl [image dir]

MetaMind/handwriting_line_generation