Why not just use "dk.brics.automaton.BasicOperations.union(Collection<Automaton> l)" to merge multiple automata to one in the construction of MultiPatternAutomaton.
andrew916 opened this issue · 2 comments
We still need forward and backword automatons to find the boundary of the matched text in MultiPatternSearcher , but the implementation of MultiPatternAutomaton will become simpler and more straightforward.
Cons: dk.brics.automaton.State must be modified to store the pattern id which is unique to each automaton.
As you pointed it out, I need to keep track of the accepted pattern id for each state (After union, it becomes a list of pattern ids). I am not exactly sure it was possible to achieve without modifying dk.brics, but I might have dropped this track a bit too early.
If you find an elegant solution I surely accept pull request :)
It would also be awesome to get all the automaton simplification algorithm to play along well with multiregexp.
I found what is done in MultiPatternAutomaton.make(...)is very much like in dk.brics.automaton.BasicOperations.determinize(...)
But, yes, multiple-pattern things cannot be done without modifying dk.brics.