Cannot use backslashes in (inverted) classes
Closed this issue · 4 comments
re"[x]"
gives Automa.RegExp.RE(:class, Any[0x78:0x78], ...
So you would assume re"[^\\\\]
would give: Automa.RegExp.RE(:class, Any[0x5c:0x5c], ...
wheras it does give Automa.RegExp.RE(:class, Any[0x5c:0x5c, 0x5d:0x5d], ...
(]
seems to also be included)
For the actual expression I try to parse re"'([^\\\\']|([\\\\].))+'"
:
ERROR: LoadError: LoadError: lparen
Stacktrace:
[1] error(::Symbol) at ./error.jl:42
[2] (::getfield(Automa.RegExp, Symbol("#pop_and_apply!#1")){Array{Automa.RegExp.RE,1},Array{Symbol,1}})() at .../src/re.jl:154
[3] parse(::String) at .../src/re.jl:220
[4] @re_str(::LineNumberNode, ::Module, ::String) at .../src/re.jl:107
Expression that seems to work is re"'([^\\x5c']|(\\x5c.))+'"
:
I'm definately not a RegExp expert and Automa's regular expressions are less extensive than other regexp engines like Julia's build in one. But, I'm not sure why you're wanting to use \\
- representing the literal \
, twice in a character class / set. Given such set's are for saying "one of these options" (or "none of these" in the case of a negated character class).
Well, ideally you would only use one backslash just like in Julia's and most other RegEx implementations, but this does not work at all:
julia> re"[\\]"
ERROR: LoadError: ArgumentError: invalid escape sequence \]
Stacktrace:
Having the class [xx]
works in the same as [x]
usually so having double backslashes should not hurt in any way at least. If the code is fixed to properly handle the escape using one backslash I think that would resolve the other issues.
One and two backslashes are escaped to the same thing:
julia> re"\\\\"
Automa.RegExp.RE(:char, ['\\'], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
julia> re"\\"
Automa.RegExp.RE(:char, ['\\'], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
And escaping hexadecimal also only works if you use double backslash:
julia> re"[\x5c]"
Automa.RegExp.RE(:class, Any[0x5d:0x5d], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
julia> re"[\\x5c]"
Automa.RegExp.RE(:class, Any[0x5c:0x5c], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)
Yeah, the regular expression parser of Automa.jl should more carefully handle backslash characters. I think it has many bugs and needs an overhaul. I will take a look soon.
Thanks. Looks pretty good from what I remember when looking at the code last :)