cpsievert/LDAvis

KL-Divergence Implementation does not handle 0 probabilities

carlosparadis opened this issue · 3 comments

When executing createJSON, the following error will be thrown:

Error in stats::cmdscale(dist.mat, k = 2) : NA values not allowed in 'd'

I traced it down to:

LDAvis/R/createJSON.R

Lines 298 to 304 in 51bb51e

# first, we compute a pairwise distance between topic distributions
# using a symmetric version of KL-divergence
# http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
jensenShannon <- function(x, y) {
m <- 0.5*(x + y)
0.5*sum(x*log(x/m)) + 0.5*sum(y*log(y/m))
}

To reproduce the issue:

Reproducible dataset

x <- c(0.2,0.3,0.3)
y <- c(0.2,0.3,0.4) 
b <- c(0.2,0.3,0) 

Using LDAvis implementation shown at the start of this issue:

> jensenShannon(x=x,y=y)
[1] 0.003583677
> jensenShannon(x=x,y=b)
[1] NaN

The same test, using cosine function from lsa package:

> cosine(x=x,y=y)
          [,1]
[1,] 0.9897595
> cosine(x=x,y=b)
          [,1]
[1,] 0.7687061

This seems like an implementation detail, not a principled reason to use one or the other. Is that correct?

Done in c7234d7

Hi there, I'm still getting this error in v0.3.5. Is this the most up to date version?