Invalid parsing of escaped unicode values
bmcminn opened this issue · 4 comments
Currently using this library in a Grunt task, and ran into the following issue:
// JSON file data being linted
{
"copyright": "\u2117 & \u00a9 2014 {{sitename}}"
}
// BASH error...
Invalid Reverse Solidus '\' declaration.
Just tested the above snippet against jsonlint.com, jsonlint pro, and jsoneditoronline and they all infer the unicode characters and parse as valid JSON data.
This snippet exists in a much deeper part of my JSON data that is compiled via PHP's json_encode
function, however the raw escaped unicode values cause this linter to throw the above error.
Escaping the reverse solidus ["\\u2117 & \\u00a9 2014 {{sitename}}"
] "fixes" the issue; though it seems inconvenient as most systems escape unicode values in this fashion by default.
Further testing shows that in jsonlint.js:7
, rvalidsolidus
is improperly regexing for the appropriate u[0-9]
combination. Changing it as described below remedies the problem, though it doesn't make sense that the explicit length of [0-9]{4}
would break like this:
// ...
rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9]{4})/, // original version
rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9]+)/, // my change
Edit: regex demo showing that the {4}
should work...
Just figured out why it invalidates and it's because the regex is ONLY listening for numeric unicode values...
Updating jsonlint.js:7
as follows corrects the problem.
// ...
rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9]{4})/, // original version
rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9A-F]{4})/i, // my change
// > catches \u1234 AND \u12aE
@codenothing I just finished updating the test on my fork and it passes. I had to modify the json-lint
dependency for nlint
because it uses jsonlint and had the same regex problem I'm trying to fix :P
In reading up on Unicode spec, under Architecture and terminology, it specifies that the Basic Multilingual Plane occupies the range of 0000
- FFFF
, and so my changes reflect this standard, because outside of Basic 0, you get into larger byte sets that the regex is not handling.
In any case, this is a pretty involved issue, because I have no idea what your goal was in supporting a particular unicode spec and how robust the validation of that should be? And so you can review my changes made here (https://github.com/bmcminn/jsonlint) and see what you think, though I plan to issue a pull request to resolve this issue.
EDIT: Pull request issued #3
Finally have a spec to reference for implementation validation: http://rfc7159.net/rfc7159#unichars