invoice-extractor

A python implementation to extract data in structured form from an image of an invoice

Flow:

original invoice

preprocessing

removing lines

this is being done to accurately detect text contours

mask obtained for vertical and horizontal lines

after applying mask

Obtained graph

after getting contours and merging them on the basis of their size and nearness *the red boxes are the identified keyfields the keyfields can be changes according to keywords given in labels.csv and label_synonnyms.csv *green boxes are the values *relation between the keyfields and it's possible values is shown by using straight lines

piyushmathur17/invoice-extractor