coli-saar/alto

SGraph hashcode caching does not work properly when changing the label of GraphNode

jgroschwitz opened this issue · 3 comments

The SGraph class caches its hashcode and only recomputes it if the invalidate() function was called (e.g. when adding a node).

However, a natural way (at least for me) to change a label of a node is this snippet, assuming g is an SGraph object:

g.getNode(nodeName).setLabel(newLabel)

Doing this does not call g.invalidate() (it can't, of course), and g remains with a wrong cached hashcode without knowing it. There is no warning of this problem in any of the functions used. This has caused very hard to find bugs for me now twice, and I'm afraid it will for others in the future.

A temporary workaround is to call

g.addNode(nodeName, newLabel)

which actually just changes the label of the node with name nodeName if that node already exists in the graph.

I see three ways of going about it:

  1. add documentation to SGraph#getNode and GraphNode#setLabel that warns about the issue, recommend the workaround and maybe change the name of the SGraph#addNode function to reflect its functionality (or create a SGraph#changeNodeLabel function).
  2. remove the caching of the SGraph hashcode. This will reduced performance, but maybe not by much and it would get rid of the problem for good.
  3. Make g.getNode(nodeName).setLabel(newLabel) impossible by removing the getNode function or the setLabel function; i.e. all changes to the graph must come from a function of the graph itself (which can call invalidate()). More work than option (1) and might remove some useful functionality, but more foolproof than (1).

@akoehn @alexanderkoller what do you think? I don't quite know which is the best option (or if there is a better one).

To reproduce the problem:

        SGraph graph1 = codec.read("(n_3<root> / --LEX--)");
        System.out.println(graph1.hashCode());

        SGraph graph2 = codec.read("(n_3<root> / bla)");
        System.out.println(graph2.hashCode()); // this is correct
        graph2.getNode("n_3").setLabel("--LEX--");
        System.out.println(graph3.hashCode()); // this is incorrectly still the hashcode from above; graph1 and graph2 have different hashcodes
        System.out.println(graph3.toIsiAmrStringWithSources()); // but the graphs are the same
        System.out.println(graph1.toIsiAmrStringWithSources());

        SGraph graph3 = codec.read("(n_3<root> / bla)");
        System.out.println(graph3.hashCode()); // this is correct, like graph2 above
        graph4.addNode("n_3", "--LEX--"); // this changes the node label and invalidates the graph
        System.out.println(graph4.hashCode()); // now the hashcode
        System.out.println(graph4.toIsiAmrStringWithSources());

Note that

As I just said, my suggestion would be to add a method SGraph#setNodeLabel, which sets the label and calls invalidate.

Regarding @akoehn's suggestion to make SGraph#getNode#setLabel impossible, I think this is a good idea, but we need to be careful about performance. Perhaps the easiest thing to do would be to make GraphNode#setLabel accessible only to the package. Thus GraphNodes would become immutable except from within the graph package (where SGraph#setNodeLabel can modify it), and everyone else can use setNodeLabel.

We could also only wrap it if assertions are enabled. Then performance would not be a problem.

I like the idea of making GraphNode#setLabel only accessible to the package; it's simple and does the job well enough. I'll check its out-of package usages for whether they can be changed. I'll add the SGraph#setNodeLabel function too (and change SGraph#addNode so that it no longer has relabeling functionality), and check if there is a similar issue with graph edges. I'm in the middle of a different project right now and don't want my coding wires crossed, so I'll do this early next week.