dm13450/dirichletprocess

Suspected Inconsistency in PosteriorClusters function.

Closed this issue · 1 comments

if (!missing(ind)) {

In this if statement, the two code paths imply very different versions of pointsPerCluster. When ind is not specified, we get

pointsPerCluster <- dpobj$weightsChain[[ind]]

when ind is specified, we get

pointsPerCluster <- dpobj$pointsPerCluster

I believe the former is normalized while the latter is not. This becomes clearest when you compare the output of PosteriorClusters without an index, and with an index specified for the last sample

y <- rnorm(1000)
dp <- DirichletProcessGaussian(y)
dp <- Fit(dp, 50)
print(round(head(PosteriorClusters(dp)$weights), 3))
# [1] 0.981 0.014 0.003 0.001 0.000 0.000
print(round(head(PosteriorClusters(dp, 50)$weights), 3))
# [1] 0.311 0.000 0.000 0.000 0.068 0.036

What's the right thing to do here? In the first code path, should pointsPerCluster be set to weightsChain[[ind]] * N where N is the sample size?

Yeah that is a bug. I've fixed it and pushed the update. So please download the latest version (0.2.2.900).
Thanks for catching it and bringing it to my attention!