/PyBGMM

Bayesian inference for Gaussian mixture model with some novel algorithms

Primary LanguagePythonMIT LicenseMIT

PyBGMM: Bayesian inference for Gaussian mixture model

Overview

Bayesian inference for Gaussian mixture model to reduce over-clustering via the powered Chinese restaurant process (pCRP). We use collapsed Gibbs sampling for posterior inference.

Code Structure

|-- GMM # base class for Gaussian mixture model
    |---- IGMM  # base class for infinite Gaussian mixture model
        |------ CRPMM     ## traditional Chinese restaurant process (CRP) mixture model
        |------ PCRPMM    ## powered Chinese restaurant process (pCRP) mixture model

Documentation

What do we include:

  • Chinese restaurant process mixture model (CRPMM)

  • Powered Chinese restaurant process (pCRP) mixture model

Examples

Code Description
CRPMM 1d Chinese restaurant process mixture model for 1d data
CRPMM 2d Chinese restaurant process mixture model for 2d data
pCRPMM 1d powered Chinese restaurant process mixture model for 1d data
pCRPMM 2d powered Chinese restaurant process mixture model for 2d data

Dependencies

  1. See requirements.txt

Lincense

MIT

Citation

The repo is based on the following research articles:

  • Lu, Jun, Meng Li, and David Dunson. "Reducing over-clustering via the powered Chinese restaurant process." arXiv preprint arXiv:1802.05392 (2018).

References

  1. H. Kamper, A. Jansen, S. King, and S. Goldwater, "Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings", in Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2014.
  2. Murphy, Kevin P. "Conjugate Bayesian analysis of the Gaussian distribution." def 1.2σ2 (2007): 16.
  3. Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.
  4. Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of Machine Learning Research 12.Oct (2011): 2825-2830.
  5. Rasmussen, Carl Edward. "The infinite Gaussian mixture model." Advances in neural information processing systems. 2000.
  6. Tadesse, Mahlet G., Naijun Sha, and Marina Vannucci. "Bayesian variable selection in clustering high-dimensional data." Journal of the American Statistical Association 100.470 (2005): 602-617.