
vegdist() in disagreement with designdist()

aloboa opened this issue · 2 comments


> x <- rbind(c(0,1,1,1), c(0,1,0,1))
> x
     [,1] [,2] [,3] [,4]
[1,]    0    1    1    1
[2,]    0    1    0    1

and the derived 2x2 frequency table

> xf
     [,1] [,2]
[1,]    2    1
[2,]    0    1

which corresponds to
a b
c d

I get:

> vegdist(xf, method="jaccard")
2 0.6666667

which is in agreement with the definition a/(a+b+c)

> 2/(2+1+0)
[1] 0.6666667

But do not get the same results using designdist()

> designdist(xf, method="a/(a+b+c)", abcd=TRUE)
2 0.5
> designdist(xf, method="(A+B-2*J)/(A+B-J)", abcd=FALSE)
2 0.5

Am I not understanding designdist() or is there a problem with that function?

The Jaccard distance is not defined as a / (a + b + c + d). It is is (b + c) / (a + b + c). You are not supposed to form the two-way table and then run that through vegdist(). The correct way is to pass the actual data, in your case this is x:

> vegdist(x, method = "jaccard", binary = TRUE)
2 0.3333333

which corresponds with (1 + 0) / (2 + 1 + 0) = (b + c) / (a + b + c):

> (1 + 0) / (2 + 1 + 0)
[1] 0.3333333

and which corresponds with designdist():

> designdist(x, method = "(A+B-2*J)/(A+B-J)")
2 0.3333333

I don't actually know what coefficient a/(a+b+c+d) yields but it certainly isn't the Jaccard distance nor is it the simple matching coefficient, and my copy of Legendre & Legendre (where I have to look these things up) is at home just now so I can't ask it.

Thanks but note that, in the code, I am not actually using
(which is 2/(2+1+0))
It was a typo in the text, which I correct.

And you are totally right that I was confusing similarity with distance
( 1 - a/(a+b+c) = (b+c)/(a+b+c)
which actually clarifies the issue.