CSC492 - Machine learning with Kinect
This semester, we had the opportunity to work with the Kinect Hardware. We wanted to use the sensor to help us with our daily lives. A common goal we both shared was about being able to wake up on time, as well as getting more exercise on a daily basis. Consequently, we ended up with the idea of having a smart alarm that uses the kinect sensor to get us to do exercises in order for the alarm to stop ringing. In the end, we were able to build a kinect-based smart alarm application that could learn exercises through training and track exercises a user performs.
The first thing we tried is writing a simple program that rings an alarm, and uses the joint data provided by the Kinect sensor to count the number of hard-coded exercises like jumping jacks (see video on left). To count the number of exercises, we coded a simple state machine that changes state upon detecting changes in angles between joints. This worked quite well after tweaking with some numbers. Of course, the main problem with this approach would be that it does not scale - it would easily go into wrong state on simple user movements, and wouldn't detect the right exercise.
-
We started by learning more about ML before diving deep into it. Initially, we were interested in existing machine learning examples for the perticular tast. After researching for a bit, we found that Hidden Markov Model (HMM) was the way to go.
-
"In a hidden markov model, a sequence is modeled as an output generated by a random process that progresses through discrete time steps. At each time step, the process outputs a symbol from a predefined alphabet and moves from one state to the next state. Both actions, the transition from state to state and the emission of an alphabet symbol, follow probabilistic distributions that define the model. These probabilities can be estimated using a training by example process."
-
Although there weren't enough examples for gesture/exercise recognition with Kinect (which was surprising), we were able to find a simple matlab app based on HMM that took 60 x-y-z coordinate sequence as training and assign it to certain gesture. In the example, we can see 'O' being trained. Initially, the data was put into 8 bins using k-means clustering (an unsupervised learning process) and then used HMM (a supervised learning process) to train from the bins (i.e. At each state, the output symbol 'O' was emmitted with some probability). This worked for us. We were able to create a simple app that tracks your hand joint and returns 60 frames of data (note: Kinect runs on 30 fps). We used that data as training as well as testing for the matlab app. We realized that it wasn't as accurate as we would've liked it to be as HMM is only able to differentiate between certain outputs but can never say it's neither of it. In this app, the programmer had used a simple threshold for probability for rejection. We also didn't wanted to combine Kinect and Machine Learning into a single app, so we decided to move on from this approach.
-
We found Accord-Framework Machine Learning library for C#, which includes all sorts of diiferent statistical models. César Souza has good explanation and example of Hidden Markov Model with Accord framework here. In order to train with the skeleton (25 joints) we received from Kinect sensor, we had to find a way to incorporate all joints into one entity. To do that, we had two options: Either to change the HMM implementation in Accord-Framework so that it takes a set of 25 x-y coordinates instead of single x-y coordinate pair, or treat a skeleton as a combination of 25 distincts gestures (one for each joint). We picked the latter as it was easier to implenent in the given timeframe.
-
Let's talk a bit about how training works. For each joint, we have unique HMM as discussed above. We create a classifier for HMM with multivariate Gaussian distribution with 2 variables (X and Y coordinates). We use 5 hidden states to accomplish this task and give Forward algorithm for states. We use Baum-Welch (an unsupervised learning process) as the learning algorithm for classifier, which learns a single hidden Markov model object from a set of observation sequences. It is a type of Expectation-Maximization algorithm to search for the correct transition and emmission probabilities for the model that would result in the model having high likelihood of generating these set of training sequences given to this algorithm. Using kinect, we were able to use voice commands to start and stop training process. Once the training sequence has been received, we compute the probability matrix using the learning algorithm discussed.
-
Collecting testing sequence is very similar. In order to classify testing sample, we compute sequence against classes using the Hiddem Markov Model Classifiers generated from training step, which computes and returns the most likely class for the given sequence. Since we are treating exercise as a collection of gestures (of joints), we collect the probabilities distribution Joints vs Classes. Finally, add up distinct class probabilities and the class with highest value is the class (or label) associated with the sequence.
-
Our original goal was to have this app count exercises in real time. So we tweaked the app to try and classify the given sequence at each new frame received (polling). Once we have conclusive evidence that the given sequence belongs to certain exercise, we accept it. This is perticularly needed since the training/testing sequences aren't guaranteed to be some fixed time-frame long. Below are some of the many problems we faced trying to achieve this goal.
-
Visit this page for information on setup and code structure.
Visit this page for video and report.