/silver-eureka

Tiny tesseract wrapper.

Primary LanguagePythonMIT LicenseMIT

silver-eureka

Tiny tesseract wrapper.

The script runs tesseract for multiple languages, which typically give the best results with Finnish fraktur font:

potential_langs = ['deu_frak','fin','dan_frak']

(For some reason tesseract file.jpg -l deu_frak+fin+dan_frak didn't do what was desired.)

Prerequisites

Running of the script

python tesse_wrap.py -i <inputfile>

as an output you get a texts of various example languages.

Example file

You can use the sortavala.jpg , which originally is from 13.01.1844 Maamiehen Ystävä no 2 s. 4, and can be accessed from Digital collections of National Library of Finland

Todo

Incorporate omorfi in order to evaluate the goodness of the OCR against different language options.