/timeuscorpus

Ground Truth datasets for French 18th and 19th HTR produced by the ANR project TIME US

Creative Commons Attribution 4.0 InternationalCC-BY-4.0

TIMEUS CORPUS

CC BY 4.0 DOI

Files Badges Regions Badges Lines Badges Chars Badges

Description

Ground Truth datasets for French 18th and 19th HTR produced by the ANR projet TIME US.

Content

Data are stored in the data/ folder. Each folder is organized as such:

  • all the images are at the root level
  • ALTO XML versions are in the alto/ folder
  • PAGE XML versions are in the page/ folder
# name nb of images GT for segmenter? GT for recognizer? description
1 cph_paris_tissage_1858 (159) n y Registers from the Prud'hommes Court for the Textile Industry in Paris, january to june 1858
2 cph_paris_tissage_1878 (89) n y Registers from the Prud'hommes Court for the Textile Industry in Paris, january 1878

Annotation system

...

How to cite

This dataset was built within the ANR project TIME US. It is maintained by Alix Chagué (@alix-tz). The original documents are copyright-free, so are the digitization and the transcription. However, digitizing archives and properly annotating a corpus takes time and it is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:

Chagué, A., Champougny, K., Meissel, N., Genero, J., Skilbeck-Gaborit, E., Vanneau, L., Bey, L., Le Fourner, V., Albert, A., Riondet, C., & Martini, M. Time Us Corpus [Data set]. https://github.com/HTR-United/timeuscorpus

@misc{Chague_Time_Us_Corpus,
author = {Chagué, Alix and Champougny, Kévin and Meissel, Nina and Genero, Jean-Damien and Skilbeck-Gaborit, Eden and Vanneau, Laurie and Bey, Laura and Le Fourner, Victoria and Albert, Anaïs and Riondet, Charles and Martini, Manuela},
title = {{Time Us Corpus}},
url = {https://github.com/HTR-United/timeuscorpus}
}

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0