
JSON Resultset UTF-8 encoding issues when escaped with \u

It appears that UTF-8 characters returned in SPARQL JSON resultsets are not properly encoded with \u.

Here is a DBPedia query that fails:

Encoded characters such as "\U0001B000" should probably encoded as "\uD82C\uDC00" instead.

knoan commented

Spot on… JSON only supports 4-digit Unicode escape sequences. Unicode characters outside the BMP must be emitted directly as a UTF-8 sequence (allowed by JSON production char) or encoded as surrogate pairs.

This is a serious bug as browser-provided JSON.parse() doesn't support lenient parsing and breaks on illegal escape sequences, as in


May be reproduced by the following query on the DBpedia endpoint:

prefix rdfs: <>

select * {
   <> rdfs:comment ?c filter (lang(?c) = 'en')
knoan commented

The following should work as a stopgap measure:

    JSON.parse(text.replace(/\\U([0-9A-Fa-f]{8})/g, function ($0, $1) {

        var c=parseInt($1, 16)-0x010000;
        var h=(c>>10)+ 0xD800;
        var l=(c & 0x3FF) + 0xDC00;

        return String.fromCharCode(h, l)


This issue was fixed a few days ago , and will be making its way to the commercial and open source archives , dbpedia included in the coming days ...

The fix for this issue has been pushed to the open source develop/7 branch:
