Sign-Language
A very simple CNN project.
What I did here
- The first thing I did was, I created 10 gesture samples using OpenCV. For each gesture I captured 1200 images which were 30x30 pixels. All theses images were in grayscale which is stored in the gestures/ folder. The gestures/0/ folder contains 1200 blank images which signify "none" gesture. Also I realised that keeping this category increased my model's accuracy to 99% from a laughable 82%.
- Learned what a CNN is and how it works. Best resources were Tensorflow's official website and machinelearningmastery.net.
- Created a CNN which look a lot similar to this MNIST classifying model using both Tensorflow and Keras. If you want to add more gestures you might need to add your own layers and also tweak some parameters, that you have to do on your own.
- Then used the model which was trained using Keras on a video stream.
There are a lot of details that I left. But these are the basic and main steps.
Requirements
- Python 3.x
- Tensorflow 1.5
- Keras
- OpenCV 3.4
- h5py
- A good grasp over the above 5 topics along with neural networks. Refer to the internet if you have problems with those. I myself am just a begineer in those.
- A good CPU (preferably with a GPU).
- Patience.... A lot of it.
How to use this repo
Before using this repo, let me warn about something. You will have no interactive interface that will tell you what to do. So you will have to figure out most of the stuff by yourself and also make some changes to the scripts if the needs arise. But here is a basic gist.
Creating a gesture
- First set your hand histogram. You do not need to do it again if you have already done it. But you do need to do it if the lighting conditions change. To do so type the command given below and follow the instructions 2-9 here.
python set_hand_hist.py
- The next thing you need to do is create your gestures. That is done by the command given below. On starting executing this program, you will have to enter the gesture number and gesture name/text. Since no checks are implemented here I suggest you do this carefully. Then an OpenCV window called "Capturing gestures" which will appear. In the webcam feed you will see a green window (inside which you will have to do your gesture) and a counter that counts the number of pictures stored.
python create_gestures.py
- Press 'c' when you are ready with your gesture. Capturing gesture will begin after a few seconds. Move your hand a little bit here and there. After the counter reaches 1200 the window will close automatically.
- When you are done adding new gestures run the load_images.py file once. You do not need to run this file again until and unless you add a new gesture.
python load_images.py
- Do not forget to update the num_of_classes variable in cnn_tf.py and cnn_keras.py file if you add any new gestures.
Training a model
- So training can be done with either Tensorflow or Keras. If you want to train using Tensorflow then run the cnn_tf.py file. If you want to train using Keras then use the cnn_keras.py file.
python cnn_tf.py
python cnn_keras.py
- If you use Tensorflow you will have the checkpoints and the metagraph file in the tmp/cnn_model3 folder.
- If you use Keras you will have the model in the root directory by the name cnn_keras2.h5.
Recognizing gestures
Before going into much details I would like to tell that I was not able to use the model trained using tensorflow. That is because I do not know how to use it. I tried using the predict() function of the Estimator API but that loads the parameters into memory every time it is called which is a huge overhead. Please help me if you can with this. The functions for prediction using tf is tf_predict() which you will find in the recognize_gesture.py file but it is never used. This is why I ended up using Keras' model, as the loading the model into memory and using it for prediction is super easy.
- For recognition start the recognize_gesture.py file.
python recognize_gesture.py
- You will have a small green box inside which you need to do your gestures.
Got a question?
If you have any questions that are bothering you please contact me on my facebook profile. Just do not ask me questions like where do I live, who do I work for etc. Also no questions like what does this line do. If you think a line is redundant or can be removed to make the program better then you can obviously ask me or make a pull request.