jgm/skylighting

Issue with hex escaped characters

Closed this issue · 8 comments

According to regex101.com, regexes could use a \x to escape a hex character. For instance, \x3f would mean ?. This may also appear inside character classes. But the current code does not support this.

'0' -> do -- \0ooo matches octal ooo

You can add the following lines of code to the pEscaped function to add this functionality.

    'x' -> do -- \xHH matches hex HH
      ds <- A.take 2
      case readMay ("'\\x" ++ U.toString ds ++ "'") of
        Just x -> return x
        Nothing -> fail "invalid hex character escape"
jgm commented

Oh, yes, I see that in fact this syntax is used in the syntax definitions:

skylighting-core/xml/powershell.xml
888:        <RegExpr attribute="Variable" context="#stay" String="\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*" />

Great. Can you take a look at whether the patch I suggested above will fix it?

jgm commented

Yes, the fix was just right -- but I also added code to handle \x{HH...} which also occurs in the syntax defs.
Thanks!

According to regex101.com, it seems that \x2 is also an acceptable way of writing \x02. How can I support this style?

What do you think of this?

    'x' -> do -- \xhh matches hex hh, \x{h+} matches hex h+
      ds <- (satisfy (== 123) *> A.takeWhile (/= 125) <* satisfy (== 125))
             <|> ( do
                      x1 <- satisfy (inClass "a-fA-F0-9")
                      x2 <- peekWord8
                      case x2 of
                        Nothing -> return $ B.pack [x1]
                        Just x2' -> if inClass "a-fA-F0-9" x2'
                                    then return $ B.pack [x1, x2']
                                    else return $ B.pack [x1]
                  )
jgm commented

peekWord8 doesn't consume the character. So you'd need to be sure to consume the character if it's in the relevant class.

Why not just

x2 <- Just <$> satisfy (inClass "a-fA-F0-9")) <|> pure Nothing
jgm commented

But I'd want to confirm that this works with KDE's regex engine before adding anything to this library.

You are right, I had to make that change.

'x' -> do -- \xhh matches hex hh, \x{h+} matches hex h+
      ds <- (satisfy (== 123) *> A.takeWhile (/= 125) <* satisfy (== 125))
             <|> ( do
                      x1 <- satisfy (inClass "a-fA-F0-9")
                      x2 <- peekWord8
                      case x2 of
                        Nothing -> return $ B.pack [x1]
                        Just x2' -> if inClass "a-fA-F0-9" x2'
                                    then anyWord8 >> (return $ B.pack [x1, x2'])
                                    else return $ B.pack [x1]
                  )

also seems to work