Issue with hex escaped characters
Closed this issue · 8 comments
According to regex101.com, regexes could use a \x
to escape a hex character. For instance, \x3f
would mean ?
. This may also appear inside character classes. But the current code does not support this.
You can add the following lines of code to the pEscaped
function to add this functionality.
'x' -> do -- \xHH matches hex HH
ds <- A.take 2
case readMay ("'\\x" ++ U.toString ds ++ "'") of
Just x -> return x
Nothing -> fail "invalid hex character escape"
Oh, yes, I see that in fact this syntax is used in the syntax definitions:
skylighting-core/xml/powershell.xml
888: <RegExpr attribute="Variable" context="#stay" String="\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*" />
Great. Can you take a look at whether the patch I suggested above will fix it?
Yes, the fix was just right -- but I also added code to handle \x{HH...}
which also occurs in the syntax defs.
Thanks!
According to regex101.com, it seems that \x2 is also an acceptable way of writing \x02. How can I support this style?
What do you think of this?
'x' -> do -- \xhh matches hex hh, \x{h+} matches hex h+
ds <- (satisfy (== 123) *> A.takeWhile (/= 125) <* satisfy (== 125))
<|> ( do
x1 <- satisfy (inClass "a-fA-F0-9")
x2 <- peekWord8
case x2 of
Nothing -> return $ B.pack [x1]
Just x2' -> if inClass "a-fA-F0-9" x2'
then return $ B.pack [x1, x2']
else return $ B.pack [x1]
)
peekWord8 doesn't consume the character. So you'd need to be sure to consume the character if it's in the relevant class.
Why not just
x2 <- Just <$> satisfy (inClass "a-fA-F0-9")) <|> pure Nothing
But I'd want to confirm that this works with KDE's regex engine before adding anything to this library.
You are right, I had to make that change.
'x' -> do -- \xhh matches hex hh, \x{h+} matches hex h+
ds <- (satisfy (== 123) *> A.takeWhile (/= 125) <* satisfy (== 125))
<|> ( do
x1 <- satisfy (inClass "a-fA-F0-9")
x2 <- peekWord8
case x2 of
Nothing -> return $ B.pack [x1]
Just x2' -> if inClass "a-fA-F0-9" x2'
then anyWord8 >> (return $ B.pack [x1, x2'])
else return $ B.pack [x1]
)
also seems to work