Ground Truth datasets for French 18th and 19th HTR produced by the ANR projet TIME US.
Data are stored in the data/
folder. Each folder is organized as such:
- all the images are at the root level
- ALTO XML versions are in the
alto/
folder - PAGE XML versions are in the
page/
folder
# | name | nb of images | GT for segmenter? | GT for recognizer? | description |
---|---|---|---|---|---|
1 | cph_paris_tissage_1858 | (159) | n | y | Registers from the Prud'hommes Court for the Textile Industry in Paris, january to june 1858 |
2 | cph_paris_tissage_1878 | (89) | n | y | Registers from the Prud'hommes Court for the Textile Industry in Paris, january 1878 |
...
This dataset was built within the ANR project TIME US. It is maintained by Alix Chagué (@alix-tz). The original documents are copyright-free, so are the digitization and the transcription. However, digitizing archives and properly annotating a corpus takes time and it is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:
Chagué, A., Champougny, K., Meissel, N., Genero, J., Skilbeck-Gaborit, E., Vanneau, L., Bey, L., Le Fourner, V., Albert, A., Riondet, C., & Martini, M. Time Us Corpus [Data set]. https://github.com/HTR-United/timeuscorpus
@misc{Chague_Time_Us_Corpus,
author = {Chagué, Alix and Champougny, Kévin and Meissel, Nina and Genero, Jean-Damien and Skilbeck-Gaborit, Eden and Vanneau, Laurie and Bey, Laura and Le Fourner, Victoria and Albert, Anaïs and Riondet, Charles and Martini, Manuela},
title = {{Time Us Corpus}},
url = {https://github.com/HTR-United/timeuscorpus}
}
This work is licensed under a Creative Commons Attribution 4.0 International License.