BioJulia/Automa.jl

Cannot use backslashes in (inverted) classes

Closed this issue · 4 comments

re"[x]" gives Automa.RegExp.RE(:class, Any[0x78:0x78], ...
So you would assume re"[^\\\\] would give: Automa.RegExp.RE(:class, Any[0x5c:0x5c], ... wheras it does give Automa.RegExp.RE(:class, Any[0x5c:0x5c, 0x5d:0x5d], ... (] seems to also be included)

For the actual expression I try to parse re"'([^\\\\']|([\\\\].))+'":

ERROR: LoadError: LoadError: lparen
Stacktrace:
 [1] error(::Symbol) at ./error.jl:42
 [2] (::getfield(Automa.RegExp, Symbol("#pop_and_apply!#1")){Array{Automa.RegExp.RE,1},Array{Symbol,1}})() at .../src/re.jl:154
 [3] parse(::String) at .../src/re.jl:220
 [4] @re_str(::LineNumberNode, ::Module, ::String) at .../src/re.jl:107

Expression that seems to work is re"'([^\\x5c']|(\\x5c.))+'":
actions

I'm definately not a RegExp expert and Automa's regular expressions are less extensive than other regexp engines like Julia's build in one. But, I'm not sure why you're wanting to use \\ - representing the literal \, twice in a character class / set. Given such set's are for saying "one of these options" (or "none of these" in the case of a negated character class).

Well, ideally you would only use one backslash just like in Julia's and most other RegEx implementations, but this does not work at all:

julia> re"[\\]"
ERROR: LoadError: ArgumentError: invalid escape sequence \]
Stacktrace:

Having the class [xx] works in the same as [x] usually so having double backslashes should not hurt in any way at least. If the code is fixed to properly handle the escape using one backslash I think that would resolve the other issues.

One and two backslashes are escaped to the same thing:

julia> re"\\\\"
Automa.RegExp.RE(:char, ['\\'], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)

julia> re"\\"
Automa.RegExp.RE(:char, ['\\'], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)

And escaping hexadecimal also only works if you use double backslash:

julia> re"[\x5c]"
Automa.RegExp.RE(:class, Any[0x5d:0x5d], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)

julia> re"[\\x5c]"
Automa.RegExp.RE(:class, Any[0x5c:0x5c], DataStructures.DefaultDict{Symbol,Array{Symbol,1},typeof(Automa.RegExp.gen_empty_names)}(), nothing)

Yeah, the regular expression parser of Automa.jl should more carefully handle backslash characters. I think it has many bugs and needs an overhaul. I will take a look soon.

Thanks. Looks pretty good from what I remember when looking at the code last :)