Network "reflection" issue
Closed this issue · 3 comments
Dear all,
First of all, thanks for making your implementation publicly available!
I was trying to use it but ran into a few bumps and traced it back to an issue regarding the tree writing order. Take the "simple example" from the documentation, for instance:
from zss import simple_distance, Node
A = (
Node("f")
.addkid(Node("a")
.addkid(Node("h"))
.addkid(Node("c")
.addkid(Node("l"))))
.addkid(Node("e"))
)
B = (
Node("f")
.addkid(Node("a")
.addkid(Node("d"))
.addkid(Node("c")
.addkid(Node("b"))))
.addkid(Node("e"))
)
simple_distance(A, B)
Now if we write B reverting the order of nodes "a" and "e", that is:
C = (
Node("f")
.addkid(Node("e"))
.addkid(Node("a")
.addkid(Node("d"))
.addkid(Node("c")
.addkid(Node("b"))))
)
the result is no longer 2, but 4.
Note that, from the point of view of the network itself, B and C are the same network, we are just writing the edges in a different order, so that the result should be the same (in principle).
If we calculate simple_distance(B, C) it will give 2, not 0.
I came across this problem because I was trying to implement a code that receives a pair of trees from the user (as edge list), and calculate the Zhang-Shasha edit distance. I used networkx topological_sort function to create the roots and used the network in the example to test it. But the result was not the expected 2, exactly because of the order in which edges are written in my code.
A workaround on my side is to use sorted lists for the neighbors, but I wondered if it is possible to have your code correct this issue directly (maybe the same workaround would solve, but I don't no since I didn't dig into your code).
If you want, I can send you the wrapper I'm writing.
Cheers!
@marfcg that behavior is expected. The reason is this particular implementation works on ordered labeled trees. In other words the ordering of the children is considered important. I also think that it may be non-trivial to consider unordered trees since you would then need to match each child up with the lowest cost child of the other tree. I think you solution (sorting the lists of children) is probably the correct solution in the case that child order is not important for you. In the application I originally wrote this for I did care about order because I was comparing Abstract Syntax Trees.
@timtadh Oh, I see. ;) Thanks for the clarification!
Due to my particular field of study on networks, for some reason I understood "ordered" as being equivalent to "directed", which are in fact two different animals.
Cheers!
no problem. Good luck!