/ocr

Implementing OCR(Optical Character Recognition) using Python and the tesseract library

Primary LanguagePython

Implementing OCR(Optical Character Recognition) using Python and the tesseract library

Installation:

  • Install tesserct-ocr using this command:
sudo apt-get install tesseract-ocr
  • Install python binding for tesseract, PyOCR, using this pip command:
pip install git+https://github.com/jflesch/pyocr.git
  • Install image processing library in python, pillow using this pip command:
pip install pillow
  • Install wand library. python bindings for Imagemagic
pip install wand

Setting up and usage

  • clone this repo
git clone https://github.com/samuelayo/ocr.git
  • change directory to this repo
cd ocr
  • read text from image or pdf
python3 ocr.py "MY_FILE_PATH"

MY_FILE_PATH refers to the path of the file you would like to use ocr on.

  • Run unit tests
python3 -m unittest discover