cpsievert/LDAvis

dist() with jensenShannon returns Nan

lmkirvan opened this issue · 6 comments

I really enjoy this package and appreciate your work on it. I've previously used it successfully, but updated the package recently and now get an error that I previously had not encountered.

I can't quit figure out why (as the jensen Shannon distance function looks okay) but

`jensenShannon <- function(x, y) {
m <- 0.5_(x + y)
0.5_sum(x_log(x/m)) + 0.5_sum(y*log(y/m))
}

dist.mat <- proxy::dist(x = parems$phi, method = jensenShannon)`

returns Nan using phi.

Hi - I'm glad you're finding the package useful!

Regarding the NaN - is it possible you have a NA value in phi? Also - a long shot here - did you mean to write params$phi rather than parems$phi, i.e. just a typo?

If you want to share the data to make the error reproducible, that might also help us troubleshoot.

-k

I can save the values of phi if that would be helpful. Let me know and I
can send it to you via email, or updload to a github repo. The phi I'm
using does not include any NA values and all rows sum to 1. There are
several zero values (because of rounding), but I understood that wouldn't
be a problem. I think that it's a problem with the distance function as
written.

jsPCA2<- function (phi)
{
jensenShannon2 <- function(x, y) {
m <- 0.5 * (x + y)
0.5 * sum(x * log(x/m)) + 0.5 * sum(y * log(y/m))
}
dist.mat <- proxy::dist(x = phi, method = 'Jaccard')
return(dist.mat)
pca.fit <- stats::cmdscale(dist.mat, k = 2)
data.frame(x = pca.fit[, 1], y = pca.fit[, 2])
}

As you can see, I edited the jsPCA function and using another distance
metric (chosen at random) does not return NaN.

I've also spotted a question on SO that looks like someone is experiencing
a similar problem.

http://stackoverflow.com/questions/35830008/r-ldavis-k-2-createjson-error

Let me know if you'd like the phi file.

Thanks for you help.

-L

On Fri, Apr 22, 2016 at 3:07 PM, Kenny Shirley notifications@github.com
wrote:

Hi - I'm glad you're finding the package useful!

Regarding the NaN - is it possible you have a NA value in phi? Also - a
long shot here - did you mean to write params$phi rather than parems$phi,
i.e. just a typo?

If you want to share the data to make the error reproducible, that might
also help us troubleshoot.

-k


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#56 (comment)

NaN are returned when you have 0 values in phi matrix. That's why you have to add constant to every value in phi matrix, like it is done in tutorial.

Hi Marcin, thanks for the great package.
I think the solution should not be to add a constant. The problem appears because R sets 0*log(0) as NaN. But mathematically, the limit of x log(x) for x to 0 is 0. Therefore, the summand in the jensenShannon metric should be 0. For example, you could replace

sum(x * log(x/m))

by

sum(ifelse(x==0,0,x * log(x/m))

Best, Maren

@Maren-Eckhoff that's great solution.

I'm not the owner of the package but @cpsievert is and might would like to know this improvement.

I have encountered the same issue,

Then, I applied the fix mentioned above by @Maren-Eckhoff (thanks!). It works in most cases but fails in some cases as well, returning the error infinite or missing values in 'x' by the method jsPCA