h0tk3y/better-parse

Tokenizer regression in 0.4.0

Closed this issue · 1 comments

I'm trying to upgrade better-parse in https://github.com/mipt-npm/kmath/tree/extended-grammar from 0.4.0-alpha-3 to 0.4.0.

The tokens are
image

Before the update the string "2+2*(2+2)" was lexed like:

[num for "2" at 0 (1:1), plus for "+" at 1 (1:2), num for "2" at 2 (1:3), mul for "*" at 3 (1:4), lpar for "(" at 4 (1:5), num for "2" at 5 (1:6), plus for "+" at 6 (1:7), num for "2" at 7 (1:8), rpar for ")" at 8 (1:9)]

After the update the same string is lexed like:

[num@1 for "2" at 0 (1:1), num@2 for "+2" at 1 (1:2), num@3 for "*(2" at 3 (1:4), num@4 for "+2" at 6 (1:7), rpar@5 for ")" at 8 (1:9)]

+ in num token with regex "[\\d.]+(?:[eE]-?\\d+)?".toRegex() makes no sense, so there is regression.

By the way, replacing token function with regexToken produces same wrong behavior.

Thanks @CommanderTvis for the report, I'll include the fix into the 0.4.1 update. As a workaround, you can use the regexToken(...) overload that accepts a String pattern, not Regex, as that overload was not affected.