cburgmer/json-path-comparison

Improve string escape usability in Proposal A

glyn opened this issue · 1 comments

glyn commented

Currently, Proposal A uses the following (PEG.js) syntax for child names that can appear in brackets with single quotes:

SingleQuotedString
  = x:"\\'" xs:SingleQuotedString { return "'" + xs; }
  / x:"\\\\" xs:SingleQuotedString { return "\\" + xs; }
  / x:[^'] xs:SingleQuotedString { return x + xs; }
  / ''

Using an online evaluator for PEG.js and feeding in selector.peg, observe that the selector:

$['\'', '\\', '\n', '\\n']

parses to:

[
   [
      "children",
      [
         [
            "name",
            "'"
         ],
         [
            "name",
            "\"
         ],
         [
            "name",
            "\n"
         ],
         [
            "name",
            "\n"
         ]
      ]
   ]
]

Firstly, notice the redundancy: both '\n' and '\\n' are parsed as "\n". But, more importantly, notice that '\n' is treated as valid.

A user who forgets which characters need to be escaped might think that '\n' represents a string consisting of a newline character. Allowing this to parse is potentially unhelpful.

If we remove the redundancy, we can fail unsupported escape sequences (and allow for other escape sequences to be added in future).

For example, the alternative syntax:

SingleQuotedString
  = x:"\\'" xs:SingleQuotedString { return "'" + xs; }
  / x:"\\\\" xs:SingleQuotedString { return "\\" + xs; }
  / x:[^'\\] xs:SingleQuotedString { return x + xs; }
  / ''

fails to parse '\n' with the message Expected "'", "\\'", "\\\\", or [^'\\] but "\\" found, but loses no expressive power.

Nice catch. Looks good to me!

Will be interesting to see how the other implementations handle this.