Demo: https://www.youtube.com/watch?v=oH0ZkfFoeYU
This program uses Python 2 and OpenCV 2 to attempt to detect the user's hand gestures. Gestures can be mapped to different actions and new gestures can be trained.
Currently works best in low lighting with the hand as the brightest portion of the image.
Features:
- Detect a single hand and obtain contour
- Determine center of the detected hand
- Track the hand over multiple frames
- Detect possible gestures
- Basic classification of possible gestures
- Record mode (new gesture training)
- Hookable API
- Sample code
Yet to be implemented (in no particular order):
- Better hand detection
- Gesture training (improving old gestures)
- Finger detection
- Depth perception
Linux: Required modules are probably bundled with your favorite distribution of Linux (i.e. Ubuntu, Debian, Linux Mint, etc.) However, in the event that it needs to be installed, the following instructions can be followed:
- On Debian/Ubuntu/etc:
sudo apt-get install python
- Download and install NumPy:
sudo apt-get install python-numpy
- Download and install OpenCV 2 for Python: http://docs.opencv.org/doc/tutorials/introduction/linux_install/linux_install.html
See testImplementation.py
for example.
- Download these files and place all of the files from the
current_src
directory into your project's working directory (to allow easy importing of modules) - The processor is provided as an object
GestureProcessor
in the GesturesApi file. To use, simply add the following import statement:from GesturesApi import GestureProcessor
- Create an instance of
GestureProcessor
, such asgp = GestureProcessor()
. You can optionally pass it a.txt
file which contains coordinates defining a gesture. If no file is specified, default gestures will be loaded (generation code can be found indefaultGesturesLoader.py
). Note: No checking is done for data integrity; it is assumed that the provided file meets the proper format specifications. - Bind gestures as desired, using the
bind()
method:gp.bind(index, fn)
.index
can either be the integer index of the gesture (in the order that the gestures were loaded) or a string containing the exact name of the gesture.fn
is a function object which takes no parameters; use closures as necessary (i.e.gp.bind(index, lambda: self.fn)
) - In the main loop, call
gp.process()
. This will grab the next camera image and update the information insidegp
, including depth and palm center. If a gesture is detected, this will also call the action that was bound to it, and updategp.lastAction
with the name of the last gesture. Note: this call is expensive and will take anywhere between 2 to 5 ms on average, depending on the machine. - You can record new gestures by calling
gp.saveNext()
. This will add the next new gesture to the list of gestures with a random name. The random name is then set asgp.lastAction
, so the programmer can change it to something more useful if desired. Alternatively, the gesture will simply be the last one ingp.gestures
, and can be modified from there. - Upon exit of the program, it is CRITICAL to call
gp.close()
. This will clean up created data and, importantly, release the camera. Failure to do so will result in the camera being active after the program appears to have exited, and will make it impossible for other applications to bind onto the camera (including new instances of the offending program.)
The process()
loop can be summarized as follows:
- Read from camera
- Convert image to binary color using thresholding
- Use OpenCV to extract contours from image
- Find the largest contour from detected list, which is assumed to be the hand contour
- Use the bounding box to determine the width and height of the contour
- Use OpenCV to find the convex hull and convexity defects of the contour
- Find the center of the hand by looking for the largest possible inscribed circle
- Use the radius of the inscribed circle to approximate a distance
- Use the positions of the palm center between times when the hand is stationary to determine gestures
- Compare the list of points against all the template gestures:
- Find the total distance of the tracked and template gestures
- Determine how far along the total distance each point falls
- For each point in the tracked gesture, look for the two template points which are closest to the fraction of the distance that the tracked point is
- Linearize between the two template points to find a point to compare the tracked point to
- Keep a running tally of the distance difference
- Find which of the gestures has the lowest distance and see if it is lower than a reasonable threshold
- Initiate callback function if tracked gestures has matched a template