point-to-define
Optical character recognition and feature detection are two important aspects of image analysis, which I combined in this project. My goal was to make an an application with which a human can directly interact. This application allows a user to hold a paper containing text in a foreign language in front of a camera, and then the user can point at any of the words on the piece of paper. The application will recognize whichever word is being pointed at, translate this word into English and display this translation on the video output. This application would be used if someone easily wants to translate words from a foreign language into English without having to type every word into an online dictionary. A user can save a significant amount of time just by pointing at the word, instead of typing. I wrote this application in Python and used OpenCV.
Features
- Detect location at which user points
- Use skin color histogram
- Find finger by looking for contours in skin color region
- Detect paper region
- Use paper color histogram
- OCR text on paper
- Translate word at which user points
- Use Google Translate to find translation
Libraries
- OpenCV: image analysis, feature detection
- Tesseract: optical character recognition
- goslate: Python package for Google Translate
How it works
Train paper histogram
Train hand histogram
Find contours
Find defects of largest contour
Find farthest point from center of hand
Result showing user pointing at word which is translated
Videos (click on each image to go to YouTube)
Requirements
OpenCV and Tesseract need to be installed.
Install OpenCV (on OS X):
pip install numpy
brew tap homebrew/science
brew install opencv
Install Tesseract (on OS X):
brew install tesseract
Installation
Set up virtualenv (optional but recommended):
virtualenv --no-site-packages venv
Install packages (if using virtualenv, source it beforehand):
pip install -r requirements.txt
If using virtualenv, you need to copy the OpenCV site-package files to the virtualenv site-package directory. You can do it like this:
cp /usr/local/lib/python2.7/site-packages/cv* ./venv/lib/python2.7/site-packages
If that didn't work you can find the location of the cv files by opening a python console and typing:
import cv2
print cv2.__file__
Usage
- From inside project directory on the command line run
python main.py
, or optionallypython main.py –v <output_video>
where output_video is the path to store the video output. - Next the application needs to be trained to recognize a hand and paper.
- Hold a piece of paper with text inside the green rectangle and then press the
P
key. - Hold your hand, so it is inside the green rectangles, and then press the
H
key.
- Point at any word on the paper for about a second and then a translation will be displayed on screen.
- To quit press the
Q
key.
To do
- Add tests
- Add support for more languages, currently only German->English is included
- Test for support for smaller and larger camera resolutions
- Use constants when finding regions of interest
- Only apply HSV to regions of interest
- Handle case where there is no internet connection