talgalili/dendextend

dist.dendlist seems to give wrong results as compared with ape

talgalili opened this issue · 0 comments

Reported by Cooley, Nicholas P:

According to the R package ape, those two trees have a Robinson-Foulds distance of 4.

The code for getting there is included below:

library(ape)
library(DECIPHER)
library(dendextend)

x <- 1:6 %>% dist %>% hclust %>% as.dendrogram
y <- set(x, "labels", c(1:3,6,4,5))

dend_diff(x,y)
dist.dendlist(dendlist(x,y))
distinct_edges(x,y)
distinct_edges(y,x)
length(distinct_edges(x,y))+length(distinct_edges(y,x)) # dist.dendlist

z <- set(x, "labels", as.character(1:6))
w <- set(y, "labels", as.character(c(1:3,6,4,5)))

TempTree <- tempfile()
WriteDendrogram(x = z,
                file = TempTree,
                quoteLabels = FALSE,
                append = FALSE)
v <- unroot(read.tree(TempTree))
unlink(TempTree)
TempTree <- tempfile()
WriteDendrogram(x = w,
                file = TempTree,
                quoteLabels = FALSE,
                append = FALSE)
u <- unroot(read.tree(TempTree))
unlink(TempTree)
dist.topo(x = v,
          y = u)

Hope this helps clarify this for you! If you need anything else just let me know.

As I understand it it’s a little more than just the unrooting, as I understand it your measure of unique branch paths is looking at tips that have different parent nodes, while the the RF distance is looking at whole data partitions, so a partition on your left dendrogram ((1,2)(3,4)) is not repeated in the tree on the right, while the partition (1,2) is repeated in both trees, and so on?

I don’t know particularly if you need to account for the unrooting. But if you can clearly argue that your branch history measure is just as valid as an RF distance that might be pretty cool? I’ve looked at a couple data sets where I collected both your measure and RF distances and the differences between the two seem pretty uniform. Though I never really dug much into it.