This project shows a simple (and unfinished) example of using Keras to recognize text in digitized documents.
No layout segmentation is down in this project...
You can download the CVL-Dataset here: https://cvl.tuwien.ac.at/research/cvl-databases/icdar2013-handwritten-digit-and-digit-string-recognition-competition/
Unpack the images in /handwritten_text_recognition/assets/cvl_dataset
In the project root folder run:
python src/handwritten_text_recognition/train.py
Be careful that the images and the corresponding ground truth are located in the same folder. The ground truth should be named according to the corresponding image and should have the .txt-extension.
`python
python src/handwritten_text_recognition/train.py with folder_containing_training_samples
`
You can use the following function to predict text: The module src/handwritten_text_recognition/ocr/text_recognition.py contains a function called recognize_text which takes a line image and predicts the text.
- [ ] Tests
- [ ] GPU support
- [ ] Post-correction