CoatiSoftware/SourcetrailDB

Error: ill-formed UTF-8 byte

Closed this issue · 1 comments

When properties or methods have special characters or diacritics (º, ª, á, ñ, ç, etc.) I get the following result:

Unable to deserialize name hierarchy "{ "name_delimiter" : ".", "name_elements" : [ { "prefix" : "", "name" : "ExampleNamespace", "postfix" : "" },
{ "prefix" : "", "name" : "ExampleClass", "postfix" : "" },
{ "prefix" : "public string", "name" : "Peça", "postfix" : "" }] }": [json.exception.parse_error.101] parse error at 263: syntax error - invalid string: ill-formed UTF-8 byte; last read: '"Peça'

Here's the formatted json:

{
    "name_delimiter": ".",
    "name_elements": [
        {
            "prefix": "",
            "name": "ExampleNamespace",
            "postfix": ""
        },
        {
            "prefix": "",
            "name": "ExampleClass",
            "postfix": ""
        },
        {
            "prefix": "public string",
            "name": "Peça",
            "postfix": ""
        }
    ]
}

This is an example that generates the above json and error (C#):

namespace ExampleNamespace
{
    public class ExampleClass
    {
        public string Peça { get; set; }
    }
}

Looks like this is just an encoding issue of the provided JSON string. I just tested this with a C++ example. I faced the same issue that you described at first, but when I added u8 to denote the encoding it worked:

std::string error;
dbWriter.recordSymbol(sourcetrail::deserializeNameHierarchyFromJson(
	u8"{ \"name_delimiter\" : \".\", \"name_elements\" : [ "
	u8"{ \"prefix\" : \"\", \"name\" : \"ExampleNamespace\", \"postfix\" : \"\" }, "
	u8"{ \"prefix\" : \"\", \"name\" : \"ExampleClass\", \"postfix\" : \"\" }, "
	u8"{ \"prefix\" : \"public string\", \"name\" : \"Peça\", \"postfix\" : \"\" }] }", &error));
std::cout << "error: " << error << std::endl;

And here is the output:

image