fulmicoton/multiregexp

Why not just use "dk.brics.automaton.BasicOperations.union(Collection<Automaton> l)" to merge multiple automata to one in the construction of MultiPatternAutomaton.

andrew916 opened this issue · 2 comments

We still need forward and backword automatons to find the boundary of the matched text in MultiPatternSearcher , but the implementation of MultiPatternAutomaton will become simpler and more straightforward.

Cons: dk.brics.automaton.State must be modified to store the pattern id which is unique to each automaton.

As you pointed it out, I need to keep track of the accepted pattern id for each state (After union, it becomes a list of pattern ids). I am not exactly sure it was possible to achieve without modifying dk.brics, but I might have dropped this track a bit too early.

If you find an elegant solution I surely accept pull request :)
It would also be awesome to get all the automaton simplification algorithm to play along well with multiregexp.

I found what is done in MultiPatternAutomaton.make(...)is very much like in dk.brics.automaton.BasicOperations.determinize(...)

But, yes, multiple-pattern things cannot be done without modifying dk.brics.