tesseract-ocr/langdata

Language pack request: Accented Belarusian

tryzniak opened this issue · 2 comments

Hello. I'd like to do it, at first, on my own, but a bit unsure how to do it. The idea is similar how you had one #8, but I want the same for Belarusian. I have a list of accented words, what to do next? Thank you for any help.

The langdata repository is for legacy models which are rarely used nowadays. Training of such models basically requires a minimal amount of training text which contains all desired glyphs (characters) and fonts to render images from that text.

For "modern" models which use the neural network engine, I suggest using tesstrain with text scanned from printed books. That requires much more work.

Thank you for the response! I'll try to do it as you suggested. GL