A fuzzy receipt parser written in Python
Note: While I will accept pull requests, I'm not planning to add any more functionality to this project myself.
Updating your housekeeping book is a tedious task: You need to manually find the shop name, the date and the total from every receipt. Then you need to write it down. At the end you want to calculate a sum of all bills. Nasty. So why not let a machine do it?
This is a fuzzy receipt parser written in Python. You give it any dirty old receipt lying around and it will try its best to find the correct data for you.
It started as a hackathon project. Read more about it on the trivago techblog. Also read the comments on HackerNews Oh hey! And there's also a talk online now if you're the visual kind of person.
Dependencies
Usage
To convert all images from the data/img/
folder to text using tesseract and parse the resulting text files, run
make run
Docker
A Dockerfile is available with all dependencies needed to run the program.
To build the image, run
make docker-build
To run it on the sample files, try
make docker-run
By default, running the image will execute the make run
command. To use with your own images, run the following:
docker run -v <path_to_input_images>:/usr/src/app/data/img mre0/receipt-parser