cs-au-dk/dk.brics.automaton

Complement pattern match problem

RuralHunter opened this issue · 4 comments

With regular expression, the pattern "(?!201[0-8])\d{4}" matches "2022" but not "2012". However with automaton, the complement pattern "~(201[0-8])\d{4}" matches both "2022" and "2012". I don't know why. How can I implement a pattern making same function with the regular expession "(?!201[0-8])\d{4}"?

In this library I don't believe \d has any special meaning, so [0-9] should be used instead.
I think ~(201[0-8])[0-9]{4} is looking for the complement of an automaton that matches certain 8 digits strings that start with 201 i.e. any 4 digit string would ba match.

I think an expression that gives what you're describing is:
[0-9]{4}&~(201[0-8])

& is the intersection operator, and requires the patterns on both sides to match.
i.e. it must be 4 digits and must not be 201[0-8]

Thanks, Daniel. (Note that the complement operator ~ has higher precedence than concatenation.)

@RuralHunter, please see the javadoc, and please don't use GitHub issue tracking for general support questions about how to use the package.

Thanks, Daniel. (Note that the complement operator ~ has higher precedence than concatenation.)

Whoops you're right, the operator precedence means that ~(201[0-8])[0-9]{4} matches any string that's not 201[0-8], followed by any 4 digits.

ok, thanks guys.
@amoeller, got it. I have read the javadoc but still there are something confusing. I suggest add some examples for patterns supported by automaton in the doc/javadoc, especially those are different with normal regular expression.