Aim

Parse documents with regular layout using OCR. This is hacky code with bad style where I mix e.g. lists and Numpy array freely so use with care.

Installation

For this code to run, it is nessecary to install the OCR engine Tesseract. It's possible to do this both on Linux and on Windows. On Ubuntu, sudo apt install tesseract-ocr usually does the trick.

On Windows it will might also be nessecary to install poppler.