evolve75/RubyTree

Cycle detection

allquixotic opened this issue · 1 comments

Cycle detection should be fairly easy to add using Ruby's built-in TSort module. It will reduce performance unless done carefully, but I managed to monkey patch a version of RubyTree that works with polytrees and errors out if you try to add a node that would create a cycle (vanilla RubyTree produces an infinite loop).

I have to detect cycles as a business requirement of my code, since I'm building a tree representation of untrusted data that SHOULD NOT have cycles, but may anyway (and if a cycle is detected, we need to go yell at someone, basically).

Any method that enumerates, iterates, or otherwise traverses the tree needs to implement cycle detection, or you will potentially get an infinite loop or (in the case of recursion) an infinite recursion and stack overflow.

At first I started by including TSort mixin into the TreeNode class, then I implemented the required tsort methods as follows:

def tsort_each_node(&block)
  self.each(&block)
end

def tsort_each_child(node, &block)
  node.children(&block)
end

Then I augmented the add method as follows, right before returning child:

# ... all the stuff before the last line of the method
#CYCLE DETECTION - raises TSort::Cyclic if this caused a cycle
  begin
    root.tsort()
  rescue TSort::Cyclic => exce
    self.remove!(child)
    raise exce
  end

Funnily enough, this doesn't work, because somewhere in this method before the return call, to_s() is getting implicitly invoked (perhaps that's a bug in and of itself, since it causes unnecessary string building when we aren't even asking about a string representation of the node). to_s() in turn asks for the size(), which can get stuck in an infinite loop on a cycle. So can the root accessor. That one is particularly sneaky (IMO).

Perhaps to avoid the performance hit this could only be enabled upon user request, but I don't really have time to develop the feature and test it fully, especially not with the full robustness expected (where if cycle detection is disabled, there is no added overhead, but if it's enabled, every possible functionality that could cause cycles to be found will throw an exception).

Anyway, just wanted to point out that it should at least be uncomplicated from an algorithmic perspective to add cycle detection when we have a reasonably efficient cycle detection algorithm built into Ruby. It's very easy to integrate it into RubyTree, at least for detecting cycles on add().

Sean,

Thanks for the suggestion. Definitely worth implementing. As you rightly pointed, we need to make sure that:

  1. There is no performance hit, if possible
  2. If there is a performance problem due to the cycle detection, then it needs to be a user invoked function, and
  3. The bug for the sneaky to_s() call needs to be resolved.

I will check this out (especially the spurious call to to_s()). Keep tuned!