Aim
Parse documents with regular layout using OCR. This is hacky code with bad style where I mix e.g. lists and Numpy array freely so use with care.
Installation
For this code to run, it is nessecary to install the OCR engine Tesseract.
It's possible to do this both on Linux and on Windows. On Ubuntu, sudo apt install tesseract-ocr
usually does the trick.
On Windows it will might also be nessecary to install poppler.