/pentaplex

An OCR scanner for receipts

Primary LanguagePythonMIT LicenseMIT

pentaplex

A receipt scanner and reader which makes use of tesseract-ocr and imagemagick. It executes five basic functionalities (hence the program’s name):

  1. scan receipt image (edge detection and warp transformation with opencv)
  2. preprocess scan (clean, sharpen, and contrast)
  3. run OCR (tesseract for optical character recognition)
  4. analyze OCR output (with fuzzy finder and preconfigured dictionary)
  5. summarize analysis in a csv file

To prepare for the scanning of the receipts, create a directory called imgs/ in the repository, and place pictures of the receipts in it; e.g. in Terminal (cd into the repository first) type something of the sort:

mkdir -p imgs/
cp ~/Downloads/*.JPG imgs/

Prerequisites

This program uses

Usage

To run pentaplex, type (of course cd into repository first):

./pentaplex [optional: auto]

Documentation

For code documentation visit: https://phdenzel.github.io/pentaplex/