zkemail/zk-regex

Rewrite regex to work with arbitrary subgroup extraction

Divide-By-0 opened this issue · 1 comments

Jern has a good suggestion for improvement:
So, the demo I and Anka built is based on the assumption that we try to match the subgroup from the minimized DFA (which is unique to each regex) However, since we label tag after it’s minimized, the tag is not 1-1 and can cause funny behavior in some cases.

Hence, to be more general, we need to start appending tag since NFA states. Now, if we literally use NFA with tags and put it in circom, it is fine tracking states, we don’t need registers. But it will take expo time due to NFA nature when extracting the tag. To collapse NFA to DFA and tracking the tag, most papers use register because it can collapse to a very succinct state machine since now you don’t need to care what index that the tag gets assigned to, you just keep the register number, and assign the index to it as the Algo reads the input.

That being said, the tagged DFA is definitely overkilled since it is literally tagged between any alphabet in regex, not the subgroup. So, I’m weakening the assumption and care only the subgroup extracting with () stuffs, and make the version using register.

At first, I thought register in circom shouldn’t be bad (and if we do that, it can easily be applied to almost all versions of tagged stuffs), but it doesn’t seem so, so I will try modify it to be ok without it, so we can stick to the transition version that u and Sora already built (I think it should work using table stuffs) but either way, the naive minimized DFA will never work when we want subgroup extraction.

Edit for https://github.com/zk-email-verify/zk-regex as well.

This is done with a frontend at https://github.com/JernKunpittaya/full_zk_regex

Unfortunately it doesn't work all the time, but hopefully can be improved!