/serialize_pdf

Convert pdf document into json object using pdftotext with bbox layout info. The returned object has simple methods to search pdf document based on key and layout position

Primary LanguagePythonMIT LicenseMIT

serialize_pdf

Documentation Status

Convert pdf documents into json object. It provides additional methods for searching within the document using regex and returns the found text with bounding box information

from serialize_pdf import serialize_pdf pdf = serialize_pdf.serialize('document.pdf')

# find value regular expression in the document kvs = pdf.get_kv(key='Net sales',val_regex = 'Net sales.{1,20}[0-9,]*')

#

Features

  • TODO

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template. PDF Serialize - https://github.com/JoshData/pdf-diff