Reroot at edge may assign edge labels incorrectly
Opened this issue · 10 comments
Hi. I'm not 100% sure this is a bug, but it looks like it (could also be a bug in me!). I have an unrooted tree produced by iqtree2
. I open it with dendropy, find a taxon node I'm interested in node = tree.find_node_with_taxon_label("name")
, check the node is found assert node is not None
, and then reroot at the edge of that node: tree.reroot_at_edge(node.edge)
.
In examining the resulting tree, I found an edge with no label on it. I wrote a tiny script to print all nodes and their edge labels (if internal) and taxon names (if not). It also prints the lengths of all edges. In a close examination of the resulting tree, I see that one of the edges of the original pseudo-root (i.e., the first node in the unrooted Newick, with valence 3) no longer has a label on it. It is the edge that leads in the direction of the root after the rerooting. I made a sketch of that part of the tree before and after the rerooting. You can see that the edge labels leading from the original pseudo-root in the direction of the edge that will be rooted on are 79, 11, and 14. After the rerooting, those labels have been assigned to edges one closer to the new root (you can see this clearly from the lengths of the edges, which are unchanged). This shifting of edge labels leaves one edge (the one labeled with 79 in the pre-rooting tree on the left in the attached image) with no label.
I took a quick look at the source to see if I could spot any problem, but didn't. I can have another look when I have more time, and can try to make a small test example to show the behavior.
I thought I would post my image here though first, to see if there are any early comments - maybe this is known or I am doing something dumb (often the case).
Thanks!

Hey Terry— Thanks for the report!! The diagram is very helpful. Would it be possible to share a copy of the script commands and data file that reproduces the issue? Could you also confirm the version of dendropy you have installed and the python version you are using?
Hi. I've added a zip file for you. Containing:
original.nwk
- the original Newick tree.original.txt
- a simple dump of the nodes of the tree, showing internal node labels, tip names, and edge lengths (the first number on each line is distance from the root, as shown next to nodes in my figure above).rerooted.nwk
- the tree re-rooted on the edge leading to theNC_001896.1
tip.rerooted.txt
- a dump of the nodes of the re-rooted tree. Look near the bottom and you'll see the node described above (I have renamed all my tips for a data security reason, but the edge labels and lengths are identical to those in the sketch, so it's easy to see the edges in question).reroot.py
- usesreroot_at_edge
to producererooted.nwk
fromoriginal.nwk
.Makefile
to save you one line of typing :-)
I am running dendropy 5.0.2
on OS X 14.5
, run with Python 3.10.13
.
Thanks!
Great! I’ll have a look and follow up shortly 👍 Thanks again for the report!
Thanks Matthew. At first I was thinking this is kind-of minor, but now I realize it could lead to something more serious. Given that the support values seem to be shifted (creating the node with no support label), presumably (after the rerooting) there is a label being dropped at the end of the chain of edges leading to the new root. It could be the case that a high support value there is dropped with a low one assigned in its place - perhaps causing misinterpretation of the tree. Or (more likely) the reverse could happen, with a lower support value replaced by a higher one. I will try to make time to look again at the code. Thanks for the reply.
Hi again Matthew. I have been playing around, looking at the source code, thinking, etc. I no longer think this might be a bug, but it does raise another issue. First of all, I made a very small example. I have a file, simple.nwk
containing (A:1,B:2,(C:5,D:4)3:3);
and I run the following code on it:
from dendropy import Tree
tree = Tree.get(path="simple.nwk", schema="newick")
node = tree.find_node_with_taxon_label("C")
tree.reroot_at_edge(node.edge, length1=10, length2=20)
tree.write(path="rerooted.nwk", schema="newick")
That is shown below:

There is only one support value (green 3) in the original tree. I realized that when the tree-making algorithm picks an arbitrary node to use as the pseudo-root when writing out the Newick, that that pseudo-root node cannot have a support value on it. Either it had edges coming into it from tips (no support by definition) or from internal nodes in which case the support value is on the node whose edge goes to the pseudo-root. The support value is on the same node after the rerooting, but it's no longer valid to interpret it as a support value. Originally it tells you how often C and D make a two-tip clade, but in the rerooted tree there is no such clade.
I made a slightly bigger example by inserting another node (E) into the above. Re-rooting on C again, we get this:

The two support values remain on their original nodes, but it is now incorrect to interpret them as bootstrap support values because the root has changed sides in the tree, and the clades they were relevant to (from the POV of the original pseudo-root) no longer exist (from the POV of the new root) and interpreting them as still being valid would be wrong.
OK, sorry for all that text. I seem to have convinced myself of two things:
- The values on the nodes actually are correctly preserved (and they may not even be support values to begin with, they could be internal node names, in which case they certainly should not be altered). I.e., this issue is not a bug in dendropy, but in me :-)
- That if you do have support values, then after rerooting they (at least the ones between the old and new roots) will be invalid and have to be recalculated. I guess that means that if you intend to do rerooting and bootstrapping, that you should do them in that order.
Thanks for reading along. Comments very welcome!
According to the "A Critical Review on the Use of Support Values in Tree Viewersand Bioinformatics Toolkits" paper (https://academic.oup.com/mbe/article/34/6/1535/3077051):
Dendropy loads inner node labels as node attributes. Therefore, if those labels are meant to represent support values, rerooting will lead to incorrect results. The Dendropy documentation explains this behavior in detail,and a workaround is available that permits to reroot trees where bootstrap values are encoded as node labels in the Newick format. A new option has been added in version 4.2 that allows to automatically translate node labels into branch support values when loading a Newick tree, so rerooting algorithms can be safely applied without further tree processing.
But I don't see any sign of such an option in the Dendropy docs. Does it actually exist?
Thanks!
Final update (I guess). I tried rooting (my exact data) with ete3 and it does not have this issue. It moves the labels so that they can be interpreted as support values in the direction of the new root.
I think dendropy and ete3 could both be improved regarding rooting. Dendropy never moves the labels. This is correct if these are in fact node names (or some other characteristic of the node) but not correct if the labels are support values. ete3 is the opposite, it always moves the labels, which is correct if they are support values but wrong if they are node names etc.
It would make sense to me (assuming I understand things properly!) that both packages allowed one to specify, when rooting, if node labels are support values (or, more generally, if the labels apply to the edge leading from the node in the direction of the root).
Thanks for listening and replying! If you want to see code that produces a correct (for support values) rooting via ete3 for the above tree/data, I can supply that.
Hi @terrycojones ---
First off, thank you very much for your dilligence in investigating this issue. In particular, digging up that snippet in the support review paper is incredibly helpful. Apologies for the delay in following up here!
I looked through the git history around release v4.2.0 to identify which keyword argument was mentioned in that paper. It looks like it's is_assign_internal_labels_to_edges
. You are correct that this was not rendering into the documentation. I fixed that in 370b77b and now the documentation appears to be rendering this option correctly (see https://jeetsukumaran.github.io/DendroPy/schemas/newick.html#schema-specific-keyword-arguments).
I agree that some action needs to be taken to prevent users from inadvertently encountering these subtle issues around rerooting and support values. For this reason, I opened PR #219 to produce a warning if "support" is detected within any node attributes.
- I would be interested to see if that warning triggers for your use case that encounters the support issue.
- Let me know if you have any suggestions for the warning message in #219, or for the conditions under which it is triggered.
- Does the
is_assign_internal_labels_to_edges
resolve the rerooting/support issue for your use case?
Hi @mmore500 I only just saw this (I was just on vacation for two weeks). I will try your suggestions ASAP, but have to go to a conference for the rest of this week so it won't happen before next week. Thanks!
No worries! Safe travels, and I’ll keep an eye out for updates on the thread 👍