mattjj/pyhsmm

Using particle Gibbs

algebravic opened this issue · 0 comments

I just found pyhsmm last week. First a minor issue:

It looks like future is not installed by default in anaconda's distribution of python, so that pip install pyhsmm
fails, unless I first install future manually.

Second, I was looking at the paper "The Infinite Hidden Markov Model" by Beal et. al. The first example that they give is of the string 30*'ABCDEFEDCB', and remark that the most parsimonious HMM which models this has 10 states. I'm assuming that the WeakLimitHDPHMM will essentially run their algorithm (of course I have to give a maximum number of states). I defined the following function:

def trainHDPHMM(data, Nmax, iterations=100, alpha_0=1.0, alpha_size=None):
    if alpha_size is None:
        alpha_size = 1+max(data)
    print("Alphabet size=%d"% alpha_size)
    obs_hyperparams = {
        'alpha_0': alpha_0,
        'K': alpha_size
        }
    obs_distns = [pyhsmm.distributions.Categorical(**obs_hyperparams) for state in range(Nmax)]
    posteriormodel = pyhsmm.models.WeakLimitHDPHMM(
        alpha_a_0=1., alpha_b_0=0.25,
        gamma_a_0=1., gamma_b_0=0.25,
        init_state_concentration=1.,
        obs_distns=obs_distns)
    posteriormodel.add_data(data)
    for idx in progprint_xrange(iterations):
        posteriormodel.resample_model()
    return posteriormodel

When I call this using

model = trainHDPHMM(np.array(map(lambda _: ord(_) - ord('A'), 30*'ABCDEFEDCB')), 40)

It sometimes (it depends on the random seed) gets a traceback like (even though I've done
np.seterr(divide='ignore'))

File "", line 1, in
File "test_hmm.py", line 24, in trainHDPHMM
posteriormodel.resample_model()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 441, in resample_model
self.resample_parameters()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 447, in resample_parameters
self.resample_trans_distn()
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/models.py", line 457, in resample_trans_distn
self.trans_distn.resample([s.stateseq for s in self.states_list])
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 378, in resample
stateseqs=stateseqs,trans_counts=trans_counts)
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 321, in resample
stateseqs=stateseqs,trans_counts=trans_counts)
File "/u/victor/.local/lib/python2.7/site-packages/pyhsmm-0.1.6-py2.7-linux-x86_64.egg/pyhsmm/internals/transitions.py", line 85, in resample
distn.resample(counts)
File "/u/victor/.local/lib/python2.7/site-packages/pybasicbayes/distributions/multinomial.py", line 107, in resample
self.weights = np.random.dirichlet(self.alphav_0 + counts)
File "mtrand.pyx", line 4744, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38703)
File "mtrand.pyx", line 4751, in mtrand.RandomState.dirichlet (numpy/random/mtrand/mtrand.c:38610)
ZeroDivisionError: float division

I see that this issue was brought up in numpy/numpy#5851 over 2 years ago, but it still doesn't seem to have been fixed (I'm running numpy version 1.31.1). Is there a workaround (or fix) for this?

But now to what I'm really interested in:

In the paper "Particle Gibbs for Infinite Hidden Markov Models" by Tripuraneni et. al. (in NIPS 2015) they describe a method using particle Gibbs to resample the state trajectory which, they say, greatly sped up apparent convergence. On one model that I'm running I find that I'm still improving, very gradually, even after 20,000 sampling iterations, so I'm interested in trying it. So, what changes would I need to make (more importantly, where) to pyshmm and/or pybasicbayes to add using particle gibbs as an option?