Given an image of a business card, extract the email address printed on it.
Start with a photo with card edges visible:
Using the Canny detector, find contours:
Choose the best contour, crop the image to contain only its bounding box (possibly rotated):
Using morphological operations, find what looks like text fields and isolate the results:
Perform OCR on every detected field, obtaining their text representations. Finally, select the best text based on its similarity to an e-mail address. In the interactive mode, simply recognise the text in the current field instead.
This project uses OpenCV 3, Tesseract and Leptonica. To install the latter libraries, you can simply get the packages tesseract-ocr-dev libleptonica-dev
(in a Debian-based Linux).
./convert [-cut | -text] filename [filenames...]
-cut Only perform card search. Outputs coordinates to stdout
-text No GUI. Outputs best guess to stdout
filename Source image. Supports multiple images