/polya

Hierarchical Bayesian Nonparametrics (Python/C++)

Primary LanguageC++MIT LicenseMIT

Polya is a Python/C++ library for performing inference with Pólya urn models (including Chinese restaurant processes) with an emphasis on (hierarchical) Pitman-Yor processes, although Dirichlet processes and Dirichlet-multinomial (and Beta-binomial) models could be handled as special cases with minor modifications. Nonparametric mixture modelling is also partially supported (only when the number of components is bounded, for now).

Example implementations of the hierarchical Pitman-Yor language model (HPYLM), the doubly-HPYLM (DHPYLM), and the PYP-HMM are also provided:

$ cd data && python brown.py && python sou.py && cd ..
$ ./hpylm.py < data/brown.reduc.txt > hpylm.log
Perplexity: 231.818141085
$ cat data/{sou.norm.train,brown.norm,sou.norm.test}.txt | \
    ./hpylm.py 3 `wc -w < data/sou.norm.test.txt` > union.log
Perplexity: 169.793801556
$ ./dhpylm.py 3 data/{sou.norm.test,sou.norm.train,brown.norm}.txt > dhpylm.log
Perplexity: 159.532152794
$ ./pyphmm.py < data/brown.tagged.simp.txt > pyphmm.log
M-1: 81.1001970389