/invoice-extractor

A python implementation to extract data in structured form from an image of an invoice

Primary LanguagePython

invoice-extractor

A python implementation to extract data in structured form from an image of an invoice

Flow:

original invoice

alt text

preprocessing

removing lines

this is being done to accurately detect text contours

mask obtained for vertical and horizontal lines

alt text

alt text

after applying mask

alt text

Obtained graph

alt text after getting contours and merging them on the basis of their size and nearness *the red boxes are the identified keyfields the keyfields can be changes according to keywords given in labels.csv and label_synonnyms.csv *green boxes are the values *relation between the keyfields and it's possible values is shown by using straight lines

Output csv

alt text