tjunier/newick_utils

nw_stats feature request

alephreish opened this issue · 5 comments

Two trees ((1,2),3,(4,5)); and ((1,2),3,(4,5,6)); have exactly the same number of splits (2), yet nw_stats reports incorrect results for the tree with the multifurcation:

$ nw_stats - -f l <<< '((1,2),3,(4,5));'   | cut -f4
2
$ nw_stats - -f l <<< '((1,2),3,(4,5,6));' | cut -f4
1

This is a rather serious bug.

Or even:

$ nw_stats - -f l <<< '((1,2,3),(4,5,6),(7,8,9));' | cut -f4
0

Any comment?

@har-wradim This returns the number of dichotomies (bifurcations), so the results are correct. By "splits" you mean internal nodes? That is not reported (but probably should as an additional value).

OK, I see:

$ nw_stats - -f l <<< '(((1,2,3),(4,5,6)),(7,8,9));' | cut -f4
2

No by splits I mean splits = bipartitions (partitions of taxa on a tree).

My mistake stems from the fact that one would normally expect the number splits as one of the summary statistics for an unrooted tree, and not the number of dichotomies.

Let's convert this thread into a feature request.

Right. They are intimately related (of course): the number of bipartitions is equal to num.internal.nodes-1 for unrooted trees and num.internal.nodes-2 for rooted trees. I agree this would be a useful property to return.