ColinEberhardt/assemblyscript-regex

Add NFA -> DFA transformation + optimizations in this domain

MaxGraey opened this issue · 6 comments

That's a really interesting article - thanks.

Just FYI - I've been exploring a multi-state NFA algorithm, which is conceptually similar to a DFA (it is in some senses an emulated DFA):

https://github.com/ColinEberhardt/assemblyscript-regex/tree/multi-state-NFA

However, it doesn't appear to be any faster in my benchmark tests!

As I understand NFA -> DFA not helps in some scenarios.

I believe the principal advantage is that DFAs do not result in back-tracking. However, many simple regex's do not result in any significant backtracking. This is why I'm keen to add some more complex expressions to the benchmark test!

Also It seems this part could be a more effecient:

function addNextState(
  state: State,
  nextStates: State[],
  visited: State[]
): void {
  if (state.epsilonTransitions.length > 0) {
    for (let i = 0; i < state.epsilonTransitions.length; i++) {
      const st = state.epsilonTransitions[i];
      if (!visited.includes(st)) {
        visited.push(st);
        addNextState(st, nextStates, visited);
      }
    }
  } else {
    nextStates.push(state);
  }
}

with avoiding recursion and use stack / queue instead. Also visited better replace to Set instead array