ethanpailes/regex

Try to think through EmptyLook support

ethanpailes opened this issue · 2 comments

I've just been ignoring this, but supporting emptylooks seems like a nice thing to do. It would be nice to actually be fully backwards comptabable with standard regex.

If I manage to support empty look stuff I could stream all the rust source down from crates.io and pull out the regular expressions to check for the applicability of my various optimizations.

So it seems like the VM side of things is fairly doable. For compilation and tset-intersection and whatnot things will be harder

Tset Intersection

Rather than just answer the questions "can this expression start with this char?" and "what chars can this expression start with?" I will need to deal with the potential position of the char and its predecessor.

Perhaps I can just conservetivly say that all test chars are simultaniously at the front, the middle, and the back of the input.

I think I can drop any emptylooks at the beginning of a concatination, and otherwise treat them as a . to be on the conservative side.

Expression Intersection

TODO

Compilation

I think emptylooks get compiled as skip 0 unless they are in branch position in which case they get compiled normally. I think this change will nececcitate doing real skip fusion rather than leaning on the parser to clump literals together (#9).