tesseract-ocr/langdata

Santali Language (Ol Chiki script) OCR

Prasanta-Hembram opened this issue · 0 comments

Hello everyone!!!! I am new to coding but when i came to know about Tesseract i thought lets have a try, i have also same issue like Balinese Script OCR #152 but in my case i use jTessBoxEditor 2.2.1 and i have Noto sans Ol Chiki as main Unicode font. In fact this language has many Unicode font. I have followed Indic-ocr but unable to contact them that how they created and trained Santali language, also they have not mentioned sat.traineddata version. I tried to search langdata in all respository but found none. I have tried to train this language but getting too many error. What is the best error free way to train this language.

Fonts list :https://github.com/indicocr/tessdata/tree/master/sat