Email Recognition with OpenCV

Problem

Given an image of a business card, extract the email address printed on it.

Solution

Start with a photo with card edges visible:

Using the Canny detector, find contours:

Choose the best contour, crop the image to contain only its bounding box (possibly rotated):

Using morphological operations, find what looks like text fields and isolate the results:

Perform OCR on every detected field, obtaining their text representations. Finally, select the best text based on its similarity to an e-mail address. In the interactive mode, simply recognise the text in the current field instead.

Dependencies

This project uses OpenCV 3, Tesseract and Leptonica. To install the latter libraries, you can simply get the packages tesseract-ocr-dev libleptonica-dev (in a Debian-based Linux).

Command-line options

./convert [-cut | -text] filename [filenames...]

-cut        Only perform card search. Outputs coordinates to stdout
-text       No GUI. Outputs best guess to stdout
filename    Source image. Supports multiple images

Norrius/email-recognition

Email Recognition with OpenCV

Problem

Solution

Dependencies

Command-line options