This is a PLSA (Probabilistic Latent Semantic Analysis) implementation via the EM (Expectation-Maximization) algorithm.
Current issues:
- The code are not well tested, so it may contain bugs. The test text are in the folder ./texts and ./test.
- The code seems not working well with small datasets, such as ./test
Reference:
EM introduction: http://blog.tomtung.com/2011/10/em-algorithm
PLSA introduction: http://blog.tomtung.com/2011/10/plsa
Note:
A Tutorial on Probabilistic Latent Semantic Analysis by Liangjie Hong is not a very good PLSA introduction material. There are some known bugs.