dist() with jensenShannon returns Nan
lmkirvan opened this issue · 6 comments
I really enjoy this package and appreciate your work on it. I've previously used it successfully, but updated the package recently and now get an error that I previously had not encountered.
I can't quit figure out why (as the jensen Shannon distance function looks okay) but
`jensenShannon <- function(x, y) {
m <- 0.5_(x + y)
0.5_sum(x_log(x/m)) + 0.5_sum(y*log(y/m))
}
dist.mat <- proxy::dist(x = parems$phi, method = jensenShannon)`
returns Nan using phi.
Hi - I'm glad you're finding the package useful!
Regarding the NaN - is it possible you have a NA value in phi
? Also - a long shot here - did you mean to write params$phi
rather than parems$phi
, i.e. just a typo?
If you want to share the data to make the error reproducible, that might also help us troubleshoot.
-k
I can save the values of phi if that would be helpful. Let me know and I
can send it to you via email, or updload to a github repo. The phi I'm
using does not include any NA values and all rows sum to 1. There are
several zero values (because of rounding), but I understood that wouldn't
be a problem. I think that it's a problem with the distance function as
written.
jsPCA2<- function (phi)
{
jensenShannon2 <- function(x, y) {
m <- 0.5 * (x + y)
0.5 * sum(x * log(x/m)) + 0.5 * sum(y * log(y/m))
}
dist.mat <- proxy::dist(x = phi, method = 'Jaccard')
return(dist.mat)
pca.fit <- stats::cmdscale(dist.mat, k = 2)
data.frame(x = pca.fit[, 1], y = pca.fit[, 2])
}
As you can see, I edited the jsPCA function and using another distance
metric (chosen at random) does not return NaN.
I've also spotted a question on SO that looks like someone is experiencing
a similar problem.
http://stackoverflow.com/questions/35830008/r-ldavis-k-2-createjson-error
Let me know if you'd like the phi file.
Thanks for you help.
-L
On Fri, Apr 22, 2016 at 3:07 PM, Kenny Shirley notifications@github.com
wrote:
Hi - I'm glad you're finding the package useful!
Regarding the NaN - is it possible you have a NA value in phi? Also - a
long shot here - did you mean to write params$phi rather than parems$phi,
i.e. just a typo?If you want to share the data to make the error reproducible, that might
also help us troubleshoot.-k
—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#56 (comment)
NaN are returned when you have 0 values in phi matrix. That's why you have to add constant to every value in phi matrix, like it is done in tutorial.
Hi Marcin, thanks for the great package.
I think the solution should not be to add a constant. The problem appears because R sets 0*log(0) as NaN. But mathematically, the limit of x log(x) for x to 0 is 0. Therefore, the summand in the jensenShannon metric should be 0. For example, you could replace
sum(x * log(x/m))
by
sum(ifelse(x==0,0,x * log(x/m))
Best, Maren
@Maren-Eckhoff that's great solution.
I'm not the owner of the package but @cpsievert is and might would like to know this improvement.
I have encountered the same issue,
Then, I applied the fix mentioned above by @Maren-Eckhoff (thanks!). It works in most cases but fails in some cases as well, returning the error infinite or missing values in 'x'
by the method jsPCA