point-to-define

Optical character recognition and feature detection are two important aspects of image analysis, which I combined in this project. My goal was to make an an application with which a human can directly interact. This application allows a user to hold a paper containing text in a foreign language in front of a camera, and then the user can point at any of the words on the piece of paper. The application will recognize whichever word is being pointed at, translate this word into English and display this translation on the video output. This application would be used if someone easily wants to translate words from a foreign language into English without having to type every word into an online dictionary. A user can save a significant amount of time just by pointing at the word, instead of typing. I wrote this application in Python and used OpenCV.

Features

Detect location at which user points

Use skin color histogram
Find finger by looking for contours in skin color region

Detect paper region

Use paper color histogram
OCR text on paper

Translate word at which user points

Use Google Translate to find translation

Libraries

OpenCV: image analysis, feature detection
Tesseract: optical character recognition
goslate: Python package for Google Translate

How it works

Train paper histogram

Train hand histogram

Find contours

Find defects of largest contour

Find farthest point from center of hand

Result showing user pointing at word which is translated

Videos (click on each image to go to YouTube)

Requirements

OpenCV and Tesseract need to be installed.

Install OpenCV (on OS X):

pip install numpy
brew tap homebrew/science
brew install opencv

Install Tesseract (on OS X):

brew install tesseract

Installation

Set up virtualenv (optional but recommended):

virtualenv --no-site-packages venv

Install packages (if using virtualenv, source it beforehand):

pip install -r requirements.txt

If using virtualenv, you need to copy the OpenCV site-package files to the virtualenv site-package directory. You can do it like this:

cp /usr/local/lib/python2.7/site-packages/cv* ./venv/lib/python2.7/site-packages

If that didn't work you can find the location of the cv files by opening a python console and typing:

import cv2
print cv2.__file__

Usage

From inside project directory on the command line run python main.py, or optionally python main.py –v <output_video> where output_video is the path to store the video output.
Next the application needs to be trained to recognize a hand and paper.

Hold a piece of paper with text inside the green rectangle and then press the P key.
Hold your hand, so it is inside the green rectangles, and then press the H key.

Point at any word on the paper for about a second and then a translation will be displayed on screen.
To quit press the Q key.

To do

Add tests
Add support for more languages, currently only German->English is included
Test for support for smaller and larger camera resolutions
Use constants when finding regions of interest
Only apply HSV to regions of interest
Handle case where there is no internet connection

benmel/point-to-define