/gw-online-dataset

Online-handwritten version of the George Washington Dataset.

Primary LanguagePython

Online-Handwritten George Washington Dataset

The George Washington dataset is a very popular collection of scanned pages of handwritten letters from George Washington and his affiliates. More specifically, data from Series 2, Letterbook 1, pages 270-279 and 300-309 is often used in word spotting experiments (see also Fischer et al). We needed to have an online-handwritten version of those pages and since no such dataset was available, we created one ourselves.

Format

You will find a subfolder for each of the 20 pages of the original George Washington dataset. It was written by a single writer and contains a text file with the online-trajectory for each single word (words that are split by a return are usually also split into two text files). The first line in each file contains the string representation of the word. The other lines each contain a single point and have the format

x<space>y<space>pen-state

where

  • x and y are the coordinates of the point in the coordinate system with the origin in the bottom left corner and the y-axis growing upwards
  • pen-state tells whether the pen stayed on the writing surface after creating the point (0) or was lifted up (1).

The render.py tool will render a given trajectory using numpy and matplotlib.

Terms of Use

This dataset may only be used for non-commercial research and educational purposes. Use the following paper as citation in your scientific work:

Christian Wieprecht, Leonard Rothacker, Gernot A. Fink, "Word Spotting in Historical Document Collections with Online-Handwritten Queries", In Proc. IAPR Int. Workshop on Document Analysis Systems, Santorini, Greece, 2016