timtadh/zhang-shasha

Does zss accept BinaryTree as a tree structure?

Closed this issue · 12 comments

I'm using the zss to calculate the edit distance between two binary trees. I have mathematical expressions as Strings (e.g. 2 * 3, (4 * 6) + 1, (20 - 40) * (3 / 3)). They are in the form of binary trees. I'm trying to use your library to get a measurement of (dis)similarity using edit distance.

In the document it says; zss.simple_distance(A,B) A and B should be two trees. Does it have to be in the format specified in the documentation? I tried passing the binary tree to the method and it resulted in an error. I'm using the following code to generate my trees. I'm new to Python. Could you please explain how should I encode the String expression to zss.simple_distance()?

from pythonds.basic.stack import Stack
from pythonds.trees.binaryTree import BinaryTree

def buildParseTree(fpexp):
    fplist = fpexp.split()
    pStack = Stack()
    eTree = BinaryTree('')
    pStack.push(eTree)
    currentTree = eTree
    for i in fplist:
        if i == '(':
            currentTree.insertLeft('')
            pStack.push(currentTree)
            currentTree = currentTree.getLeftChild()
        elif i not in ['+', '-', '*', '/', ')']:
            currentTree.setRootVal(int(i))
            parent = pStack.pop()
            currentTree = parent
        elif i in ['+', '-', '*', '/']:
            currentTree.setRootVal(i)
            currentTree.insertRight('')
            pStack.push(currentTree)
            currentTree = currentTree.getRightChild()
        elif i == ')':
            currentTree = pStack.pop()
        else:
            raise ValueError
    return eTree

pt = buildParseTree("( ( ( 75 + 2 ) - 21 ) + ( 94 - 90 ) )")

Thank you.

I should say, if you have additional questions let me know. Thanks.

Thank you @timtadh. I want to confirm something. Let's assume I have a tree as follows.

A = (
	Node('+')
		.addkid(Node('2'))
		.addkid(Node('3'))
	)

What would be the left node in this case? 3 or 2?

 +       +
 /\      /\
2  3 	3  2

I think the .addkid() uses a list to add children at each level and therefore first entered the list should be the left node. So in this case 3 should be the left child. Am I correct?

Hi @yasodajayaweera ,

The nodes are inserted left to right with addkid(). The tree looks like:

  +
 / \
2   3

You might also like my betterast package which support visualization via the dotty() method using graphviz:

@timtadh Thank you for the reply. I saved my node structures as text files and I'm trying to read them as Node objects and compute the distance. When I do the following I get the distance as 1. I'm using the same tree so I should get zero ideally. Am I doing anything wrong?

code:

from zss import simple_distance, Node

f = open("C:/.../A.txt","r")
f2 = open("C:/.../A.txt","r")

A = Node(f.readlines())
B = Node(f2.readlines())

dist = simple_distance(A,B)
print(dist)

File structure:

Node("+") 
	.addkid(Node("+") 
		.addkid(Node("2")) 
		.addkid(Node("4")))
	.addkid(Node("*") 
		.addkid(Node("3")) 
		.addkid(Node("5"))) 

@yasodajayaweera that will not work. Your "text files" contain python code, you need to execute it for it form the tree structure. Reading them in simply causes the string to be the label on the node. You should put them in python files and import them or serialize and load your trees.

@timtadh Thank you for the quick response. I understand what you mean. I tried converting the structure into python files and importing, the but it did not work. However I'm kind of lost at the moment.

Following is the algorithm I'm using.
For example I have a mathematical expression ( 2 * 3) ini the string format and I want to create a tree structure.

  1. First I break the expression into tokens [‘(’, ‘2’, ‘*’, ‘3’, ‘)’]
  2. Create an empty tree
  3. Read ‘(‘ as the first token. Create a new node as the left child of the root. Make the current node this new child.
  4. Read 2 as the next token. Set the root value of the current node to 2 and go back up the tree to the parent.
  5. Read * as the next token. Set the root value of the current node to * and add a new node as the right child. The new right child becomes the current node.
  6. Read 3 as the next token. Set the root value of the current node to 3. Make the parent of 3 the current node.
  7. Read ) as the next token. Make the parent of * the current node. At this point there is no parent for * so we are done.

So I'm trying to build a tree using the Node structure in zss. I tried implementing as follows. However it doesn't seem to be right. I'm trying to create a Binary tree using the Node structure.

from pythonds.basic.stack import Stack
from zss import Node

class BinaryTreeNode(object):

    def __init__(self, label = None):
        self.my_label = label
        self.my_children = [None]*2 #in a binary tree a node always have two children 

    @staticmethod
    def get_children(node):
        return node.my_children

    def get_right_child(self):
        return self.my_children[1]

    def get_left_child(self):
        return self.my_children[0]

    @staticmethod
    def get_label(node):
        return node.my_label

    def set_label(self,value):
        self.my_label = value

    def addkid(self, node, before=False):
        if before:
            self.my_children.insert(0, node)
        else:
            self.my_children.insert(1,node)
        return self

def buildParseTree(fpexp):
    fplist = fpexp.split()
    pStack = Stack()
    eTree = BinaryTreeNode('')
    pStack.push(eTree)
    currentTree = eTree
    for i in fplist:
        if i == '(':
            currentTree.addkid(BinaryTreeNode(''))
            pStack.push(currentTree)
            currentTree = currentTree.get_left_child()
        elif i not in ['+', '-', '*', '/', ')']:
            currentTree.set_label(int(i))
            parent = pStack.pop()
            currentTree = parent
        elif i in ['+', '-', '*', '/']:
            currentTree.set_label(i)
            currentTree.addkid(BinaryTreeNode(''))
            pStack.push(currentTree)
            currentTree = currentTree.get_right_child()
        elif i == ')':
            currentTree = pStack.pop()
        else:
            raise ValueError
    return eTree

pt = buildParseTree("( 2 + 4 )")

Will you be able to give some feedback on this? Is there an easy way to go about this?

Here is an example of parsing expression trees with PLY that creates an AST: https://github.com/timtadh/PyOhio2011/blob/master/astgen_lalr.py

@timtadh Thank you very much. I'll check the link.

Also, if you are interested in parsing and have not taken a compiler's course this is a really good free one: https://www.youtube.com/playlist?list=PLFB9EC7B8FE963EB8 . If you prefer a book (I find them helpful) a decent one is https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811 (widely available in libraries). Another popular one is: https://www.amazon.com/Engineering-Compiler-Keith-Cooper/dp/012088478X

@timtadh Thank you very much. I'm quite new to Python and the PLY and AST seems bit too difficult to grasp :) However I'm trying to convert the tree into a nested list format as you have shown in one of your examples. Hopefully it might work. Thank you again.

Ok. Here is a recursive descent version of the same grammar in Java (also by me): https://github.com/hacsoc/expr/blob/master/src/main/java/cwru/hacsoc/expr/Parser.java .