BioJulia/Automa.jl

Taking preconditions seriously

Closed this issue · 1 comments

Automa already has preconditions - these are named Expr objects attached to edges, similar to actions. When encountering an edge, its code generators first check the byteset membership, then it evaluated the precondition's Expr, which must evaluate to a Bool, and takes the edge if evaluated to true. Preconditions are attached to regex, and apply to all transitions inside that regex.

In other words, preconditions enable you to state: Only move within this regex if this condition is fulfilled. This is particularly useful when dealing with ambiguous DFAs that can be disambiguated using preconditions - essentially, this is like manually inserting an if-statement into the finite state machine.

However, Automa's support for preconditions is shaky, and frankly, I believe, might be buggy. I did some experiments that seemed to suggest that Automa would collapse two NFA paths with different preconditions... which sort of defeats the purpose.
This needs to be thoroughly tested

  • Enable preconditions on "enter" only, in addition to the already existing "all"
  • Make sure not to collapse NFA paths with disambiguating preconditions on DFA creation. Then remove precond check in DFA validation code path
  • Have table-based generators support predond. Simply insert an if statement after state transition.
  • Check that edges with preconditions do not allow SIMD loops (just to be sure!)
  • Test, test, test and test more

In particular, I want something like this to work:

a1 = re"A"
a2 = re"A"
b1 = re"B"
b2 = re"B"
b2.when = :foo
c1 = re"C"
c1.actions[:enter] = [:bar]
Automa.compile((a1 * b1 * c1) | (a2 * b2 * re"C"))

, i.e. where otherwise indistinguishable paths a1-b1-c1 and a2-b2-c2 with conflicting actions are disambiguated by a precondition on b2.

Cool! This seems like it adds some complexity (or rather fixes something complicated... I didn't know this functionality existed), but that's probably necessary for a lot of biological formats.