Hand Gesture Recognition Using MediaPipe (Extended Version)

This repository builds upon the work of Kazuhito Takahashi, whose original project can be found here, and the English translation done by Nikita Kiselov. The purpose of this extended version is to further enhance the functionality and documentation while providing additional improvements, including the addition of a hand sign recognition model for letters A to F.


mqlrf-s6x16

This repository contains the following contents.

  • Sample program
  • Hand sign recognition model(TFLite)
  • Finger gesture recognition model(TFLite)
  • Learning data for hand sign recognition and notebook for learning
  • Learning data for finger gesture recognition and notebook for learning

Original Credits

This project is built upon two key works:

  1. The original hand-gesture-recognition-using-mediapipe project by Kazuhito Takahashi. You can find the original repository here.
  2. The English translation of the original project, made by Nikita Kiselov. You can find the translated repository here.

Requirements

  • mediapipe 0.8.1
  • OpenCV 3.4.2 or Later
  • Tensorflow 2.3.0 or Later
    tf-nightly 2.5.0.dev or later (Only when creating a TFLite for an LSTM model)
  • scikit-learn 0.23.2 or Later (Only if you want to display the confusion matrix)
  • matplotlib 3.3.2 or Later (Only if you want to display the confusion matrix)
  • pandas
  • numpy
  • seaborn

Demo

Here's how to run the demo using your webcam.

python app.py

The following options can be specified when running the demo.

  • --device
    Specifying the camera device number (Default:0)
  • --width
    Width at the time of camera capture (Default:960)
  • --height
    Height at the time of camera capture (Default:540)
  • --use_static_image_mode
    Whether to use static_image_mode option for MediaPipe inference (Default:Unspecified)
  • --min_detection_confidence
    Detection confidence threshold (Default:0.5)
  • --min_tracking_confidence
    Tracking confidence threshold (Default:0.5)

Directory

│  app.py
│  keypoint_classification.ipynb
│  point_history_classification.ipynb
│  
├─model
│  ├─keypoint_classifier
│  │  │  keypoint.csv
│  │  │  keypoint_classifier.hdf5
│  │  │  keypoint_classifier.py
│  │  │  keypoint_classifier.tflite
│  │  └─ keypoint_classifier_label.csv
│  │          
│  ├─point_history_classifier
│  │   │  point_history.csv
│  │   │  point_history_classifier.hdf5
│  │   │  point_history_classifier.py
│  │   │  point_history_classifier.tflite
│  │   └─ point_history_classifier_label.csv
│  │
│  └─handsign_classifier
│      │  handsign.csv
│      │  handsigndraft.csv
│      │  handsign_classifier.hdf5
│      │  handsign_classifier.py
│      │  handsign_classifier.tflite
│      └─ handsign_classifier_label.csv
│          
└─utils
    └─cvfpscalc.py

app.py

This is a sample program for inference.
In addition, learning data (key points) for hand sign recognition,
You can also collect training data (index finger coordinate history) for finger gesture recognition.

keypoint_classification.ipynb

This is a model training script for hand sign recognition.

point_history_classification.ipynb

This is a model training script for finger gesture recognition.

model/keypoint_classifier

This directory stores files related to hand sign recognition.
The following files are stored.

  • Training data(keypoint.csv)
  • Trained model(keypoint_classifier.tflite)
  • Label data(keypoint_classifier_label.csv)
  • Inference module(keypoint_classifier.py)

model/point_history_classifier

This directory stores files related to finger gesture recognition.
The following files are stored.

  • Training data(point_history.csv)
  • Trained model(point_history_classifier.tflite)
  • Label data(point_history_classifier_label.csv)
  • Inference module(point_history_classifier.py)

utils/cvfpscalc.py

This is a module for FPS measurement.

Training

Hand sign recognition and finger gesture recognition can add and change training data and retrain the model.

Hand sign recognition training

1.Learning data collection

Press "k" to enter the mode to save key points(displayed as 「MODE:Logging Key Point」)


If you press "0" to "9", the key points will be added to "model/keypoint_classifier/keypoint.csv" as shown below.
1st column: Pressed number (used as class ID), 2nd and subsequent columns: Key point coordinates


The key point coordinates are the ones that have undergone the following preprocessing up to ④.


In the initial state, three types of learning data are included: open hand (class ID: 0), close hand (class ID: 1), and pointing (class ID: 2).
If necessary, add 3 or later, or delete the existing data of csv to prepare the training data.

2.Model training

Open "keypoint_classification.ipynb" in Jupyter Notebook and execute from top to bottom.
To change the number of training data classes, change the value of "NUM_CLASSES = 3"
and modify the label of "model/keypoint_classifier/keypoint_classifier_label.csv" as appropriate.

X.Model structure

The image of the model prepared in "keypoint_classification.ipynb" is as follows.

Finger gesture recognition training

1.Learning data collection

Press "h" to enter the mode to save the history of fingertip coordinates (displayed as "MODE:Logging Point History").


If you press "0" to "9", the key points will be added to "model/point_history_classifier/point_history.csv" as shown below.
1st column: Pressed number (used as class ID), 2nd and subsequent columns: Coordinate history


The key point coordinates are the ones that have undergone the following preprocessing up to ④.


In the initial state, 4 types of learning data are included: stationary (class ID: 0), clockwise (class ID: 1), counterclockwise (class ID: 2), and moving (class ID: 4).
If necessary, add 5 or later, or delete the existing data of csv to prepare the training data.

2.Model training

Open "point_history_classification.ipynb" in Jupyter Notebook and execute from top to bottom.
To change the number of training data classes, change the value of "NUM_CLASSES = 4" and
modify the label of "model/point_history_classifier/point_history_classifier_label.csv" as appropriate.

X.Model structure

The image of the model prepared in "point_history_classification.ipynb" is as follows.
The model using "LSTM" is as follows.
Please change "use_lstm = False" to "True" when using (tf-nightly required (as of 2020/12/16))

Reference

Author

Kazuhito Takahashi(https://twitter.com/KzhtTkhs)

Translation

Nikita Kiselov(https://github.com/kinivi)

Sign Language Detection

Chen Wenlong(https://github.com/c-wenlong/hand-sign-classifier-unity-meta-quest-3)

License

hand-gesture-recognition-using-mediapipe is under Apache v2 license.