Tokenizer regression in 0.4.0
CommanderTvis opened this issue · 1 comments
I'm trying to upgrade better-parse in https://github.com/mipt-npm/kmath/tree/extended-grammar from 0.4.0-alpha-3 to 0.4.0.
Before the update the string "2+2*(2+2)"
was lexed like:
[num for "2" at 0 (1:1), plus for "+" at 1 (1:2), num for "2" at 2 (1:3), mul for "*" at 3 (1:4), lpar for "(" at 4 (1:5), num for "2" at 5 (1:6), plus for "+" at 6 (1:7), num for "2" at 7 (1:8), rpar for ")" at 8 (1:9)]
After the update the same string is lexed like:
[num@1 for "2" at 0 (1:1), num@2 for "+2" at 1 (1:2), num@3 for "*(2" at 3 (1:4), num@4 for "+2" at 6 (1:7), rpar@5 for ")" at 8 (1:9)]
+
in num token with regex "[\\d.]+(?:[eE]-?\\d+)?".toRegex()
makes no sense, so there is regression.
By the way, replacing token
function with regexToken
produces same wrong behavior.
Thanks @CommanderTvis for the report, I'll include the fix into the 0.4.1
update. As a workaround, you can use the regexToken(...)
overload that accepts a String
pattern, not Regex
, as that overload was not affected.