curiosity-ai/catalyst

collect the important detail from invoice document (pdf)

Opened this issue · 0 comments

Hi all,

I want to prepare a project to collect the important detail from invoice document pdf (Like, Invoice Number, Date, Total Due, Seller Name etc.) as Key-value pairs.
We prepare the HOCR file from pdf file using OCR engine (Tesseract).
Kindly help us how further proceed with input HOCR file to extract key-value pairs using "catalyst".

Or other approach to prepare Key-value pairs using "catalyst".

Thank in advance.