Expectation Maximization Algorithm

Table of Contents
  1. Background
  2. Model used for the EM algorithm
  3. Implementation of the EM algorithm
  4. Results obtained for different values of K
  5. Interpretation of results
  6. License
  7. Contact
  8. References

Background

The EM algorithm is commonly used for estimating the best parameters which would represent a given dataset. It is an elegant and powerful method for finding maximum likelihood solutions for models with latent variables. It consists of two main steps, namely the E step (expectation step) and M step (maximization step). In the E step, the current parameter values are used to evaluate the posterior probabilities, or responsibilities. These probabilities are then used in the M step, such that to re-estimate the corresponding means and covariances. The algorithm is said to be complete when either the log likelihhod or parameters of the model have converged. A more in depth explanation of the EM algorithm can be found in the Bishop book [1]. An example of an application of the EM algorithm is with the popular clustering K-means algorithm.

(back to top)

Model used for the EM algorithm

alt text

alt text

Implementation of the EM algorithm

alt text

(back to top)

Results obtained for different values of K

Results for K=2

alt text      alt text     

Results for K=3

alt text      alt text     

Results for K=4

alt text      alt text     

(back to top)

Interpretation of results

When we have a value of K=3, the Gaussian distributions seem to be more equally distributed among all the points, although the blue cluster does appear to contain more points than the others. If we consider the points by themselves, i.e without their corresponding clusters, we can already point out 3 possible clusters already, namely the ones that have been created after running our EM algorithm. Therefore this is why we believe that setting a value of K=3 resulted in the most successful set of results, or at least the ones that make the most sense intuitively. Looking at the Poisson distribution for K=3, we essentially obtain a similar set of results, being that the points are more or less equally distributed among the different Poisson distributions.


Finally when setting a value of K=4, the results we obtain aren't very representative of what is happening. If we look at the points on the upper left of the diagram, they are shared between the red and green cluster which creates contour plots of their Gaussian distributions that are intertwined with each other, when in reality they should belong to the same cluster and hence be attributed to only one Gaussian distribution. The reason this is happening is because we are explicitly telling our EM algorithm that it should assign the points to 4 clusters, and hence it performs a "forced" assignment so to speak, which doesn't end up as a very successful result, at least we believe that to be the case. The same point can be said for the Poisson distributions.

Contact

Robert Rey - LinkedIn

Project Link: Expectation-Maximization

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

References

[1] Bishop C.M.B, 2006, Mixture Models and EM, Pattern Recognition and Machine Learning, Springer, 423-455.