Blueprint is a declarative extraction language for semi-structured documents.
Start by cloning this repo to your machine.
To run on a sample paystub:
- Add
path/to/blueprint-oss/blueprint/py
to yourPYTHONPATH
- Run
pip3 install -r path/to/blueprint-oss/blueprint/requirements.txt
- From the
blueprint/reference_extractions/paystubs
folder, runpython3 paystubs.py run_model -v -g ocr/sample_paystub.jpg.json
To generate your own OCR documents:
- Upload image here: https://cloud.google.com/vision/docs/drag-and-drop
- Navigate to the Text tab and click "Show JSON"
- Save a JSON file with the entire Response
TODO
TODO