Suboptimal DFA construction when epsilon loop in NFA
Closed this issue · 1 comments
jakobnissen commented
Construct an NFA with:
import Automa
const re = Automa.RegExp
nfa = Automa.re2nfa(re.rep(re.rep(re.any()) * re.opt(re.parse("A"))))
Notice that this regex matches literally any sequence of bytes.
View the constructed NFA:
open("/tmp/nfa.dot", "w") do file
println(file, Automa.nfa2dot(nfa))
end
# in shell, do "dot -Tsvg /tmp/nfa.dot > /tmp/nfa.svg
Note that it has a loop of epsilon edges. Now construct a DFA (or Machine, whatever) from that NFA and visualise it:
machine = Automa.dfa2machine(Automa.nfa2dfa(nfa))
open("/tmp/machine.dot", "w") do file
println(file, Automa.machine2dot(machine))
end
That's wrong. It doesn't even make any sense. Those three nodes are completely indistinguishable. It should look like this:
The mistake is somewhere in nfa2dfa
where it mistakenly believes the DFA should have 3 nodes instead of 1.
jakobnissen commented
Nevermind, it turns out that instantiating a Machine
using dfa2machine
does not optimize it. If you use reduce_nodes
, it will be correctly optimized. This happens when using Automa.compile
by default.