autoocr

A Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)

Installations

pip3 install autoocr

from autoocr import AutoOCR # import the AutoOCR class

oa = AutoOCR(lang='bangla') # specify the language code

Set the tessdata folder, on mac you can do brew list tesseract to get the path. This is only needed once.

oa.set_datapath('/usr/local/Cellar/tesseract/4.0.0_1/share/tessdata')

out_text = oa.get_text('image_ocr.jpg')

from autoocr import AutoOCR # import the AutoOCR class

oa = AutoOCR(lang='bangla') # specify the language code

oa.set_datapath('/path/to/tessdata')

out_text = oa.get_text('image_ocr.jpg')

from autoocr import AutoOCR # import the AutoOCR class

oa = AutoOCR(lang='bangla') # specify the language code

Set the tessdata folder. This is only needed once. Run, rpm -ql tesseract for yum to get the location.

oa.set_datapath('/path/to/tessdata')

out_text = oa.get_text('image_ocr.jpg')

This project is licensed under the MIT License - see the LICENSE file for details.