BinaryAnalysisPlatform/bap

Node.update appears broken

cemerick opened this issue · 3 comments

Example:

module Test = struct
  open Core_kernel
  module G = Graphlib.Std.Graphlib.Make(Int)(String)
  let g = Graphlib.Std.Graphlib.create (module G) ~edges:[1, 2, "a"; 2, 3, "b"] ()
  let g' = G.Node.update 2 4 g
  let () = G.edges g'
        |> Sequence.iter ~f:G.Edge.(fun e -> Format.printf "%d %d %s@." (src e) (dst e) (label e))
end

This prints:

2 3 b
4 3 b

I would expect:

1 4 a
4 3 b
ivg commented

Sorry for the late reply I was on a vacation. I marked it as a bug though it is not really a bug but rather a very weird behavior. I am planning to give this operation a better semantics, which will not be that surprising.

The update operation works correctly with the labeled graphs, created with Graphlib.Labeled. A very strange semantics of nodes is inherited by graphlib from OCamlgraph. Our design constraint was to be fully compatible with OCamlgraph and their vertex interface is,

  type t

  include COMPARABLE with type t := t

  (** Vertices are labeled. *)

  type label
  val create : label -> t
  val label : t -> label

The pair of functions label -> t and t -> label establishes isomorphism between a label and vertex, well if we will assume that create is injective and pure.

As a result, in Graphlib we have equality between labels and nodes. Even in the labeled graph both labels and nodes are represented with the same type. The update operation makes sense only when both nodes are equal but have different labels, e.g.,

G.update {node = 1; node_label = "1"} {node = 1; node_label = "one"} g

So what the update operation currently does is that assuming that n and n' are equal but different it updates all n'. Yes, it sounds as it sounds. It sounds very broken. A labeled_node is compare by node so {node = 1; node_label = "1"} and {node = 1; node_label = "one"} are equal but different.

When we have an unlabeled graph, saying G.Node.update 2 4 g is the same as saying that 2 = 4 so it doesn't make any sense. A node is equal to its label, so G.Node.label x is always x and indeed if you will take a look into the formal definition, the unlabeled graph is denoted as,

        With this general framework an unlabeled graph can be
        represented as:

        {v G = (N, E, N, E, ν = λx.x, ε = λx.x) v}

So the ν function is an identity.

I understand that this all sounds weird and wtfs/minute rate is very high but this is indeed the intended and specified semantics.

I understand, that what you really want is to change node 2 to node 4 so that every occurrence of 2 is substituted with 4. You may even want to update a node to an existing node for coalescing them, which also makes sense. But we don't have an operation for that. Now I am thinking on if we can abuse update for that and how well it will play with the labeled graphs.

Besides, if what you want is actually to update the label, then you should use the labeled graph.

I appreciate your consideration. Your reply here reminds me of an SO answer (/rant 😉) of yours I remember from a while ago, https://stackoverflow.com/a/47679557/11809

It sounds like update (at least as it currently sits) should only be part of the Graphlib.Labeled signature then?

My workaround at the moment (which is perfectly fine, even from a perf perspective) is to fold through all of the existing node's inputs and outputs and replace them with edges to the replacement node. I suppose I'm operating with the same mindset as ocamlgraph's "original sin", and in this particular case it'd be simpler / more efficient to swap a label; but I'm disinclined to have synthetic/semantically void node identifiers (ints I suppose), and the labelled graph API comes with additional awkwardness.

ivg commented

That's a good answer, unfortunately, I forgot about it)) After a lot of consideration, I ended up on #1340. The G.Node.update was broken in a sense that for a graph with unlabeled nodes it shouldn't at least destroy the graph and remove random nodes. So the new behavior is that G.Node.update n l g returns g for all n an l.

It sounds like update (at least as it currently sits) should only be part of the Graphlib.Labeled signature then?

Yes, but we don't have such a signature, neither does OCamlgraph.