/ocr-model-catalogue

This repository contains a collection of layout analysis and text recogntion models.

Creative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

OCR Model Catalogue

Welcome to our repository, where we have compiled a diverse range of Layout Analysis (LA) and Optical Character Recognition (OCR) models. This collection is aimed at providing researchers, developers, and hobbyists with easy access to a variety of OCR models.

About OCR

Optical Character Recognition (OCR) is a field of study that involves the conversion of typed, handwritten, or printed text into machine-encoded text. OCR technology is used to digitize printed texts, so that they can be electronically edited, searched, stored more compactly, and used in machine processes such as machine translation, text-to-speech, and data mining.

We have LA und OCR models for different OCR-Engines

Repository Structure

The structure of the repo is the following:

├── LICENSE.md
└── data
   └── OCR-Model as submodule   

Here's our OCR Model Catalogue:

📂 Models

Model OCR-Engine Type of model Description Default model
German print Kraken Text recognition Kraken model for german prints trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German print Tesseract Text recognition OCR model for german prints trained from several datasets. Best model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German print Tesseract Text recognition OCR model for german prints trained from several datasets. Fast model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Kraken Text recognition Kraken model with kraken topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Kraken Text recognition Kraken model with sgd topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Kraken Text recognition Kraken model with htr+ topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Kraken Text recognition Kraken model with htru topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Kraken Text recognition Kraken model with gpt topology for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Kraken Text recognition Kraken (default) model for german newspapers trained from several datasets. See https://github.com/UB-Mannheim/kraken/wiki/Training-German-Print Download
German newspapers Tesseract Text recognition OCR model for german newspapers trained from several datasets. Best model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers Download
German newspapers Tesseract Text recognition OCR model for german newspapers trained from several datasets. Fast model variant for Tesseract. See https://github.com/UB-Mannheim/kraken/wiki/Training-german-newspapers Download
UBMA Segmentation Kraken Layout analysis Kraken segmentation model for a wide range of materials. Download
Historical Reports 2col Kraken Layout analysis A Kraken segmentation model for 2 column layout. Download

License

See the LICENSE file in the repository for more details.

Contact

For any queries or suggestions, feel free to open an issue in this repository, or contact us at OCR-Helpdesk. Thank you for exploring our OCR Models Collection. We hope this repository aids you in your text recognition projects and research!