VBA-tools/VBA-JSON

Unicode/UTF-8 is unnecessarily escaped

Opened this issue · 15 comments

the result is "\u963f"
ohmygod
the picture shows my trouble. can anyone help me?

Hi @jujucheng can you give a little more explanation of the issue? Does this describe the problem?

Input: JsonConverter.ConvertToJson(Array("阿"))
Expected: ["阿"]
Actual: ["\u963f"]

Ok, I have reproduced that issue, It looks like those characters may be escaped according to the spec, but don't need to be. I'll look into possibly changing it or adding an option for conditionally escaping Unicode/UTF-8.

ya, that the problem,QQ
really looking forward to a solution!

vikct commented

Hi Tim, just wondering has this issue been fixed? cos i have encountered the same issue as well.

something like this: "SMS": "\u77ED\u4FE1"

Hi,

This is just my resolution case and not general type.
I hope you get any hints here.

Reference site
'https://social.technet.microsoft.com/Forums/en-US/c1def5ab-7c60-4927-b828-f015c4853795/excel-file-to-utf8-encoded-text-file?forum=officesetupdeploylegacy

Set objStream = CreateObject("ADODB.Stream")

'----- Some your codes -----
'Remove or set to comment code below.
'myfile = Application.ActiveWorkbook.Path & "\data.json"
'Open myfile For Output As #1
'Print #1, ConvertToJson(items, Whitespace:=2)
'Close #1

    With objStream

        .Type = 2
        .Charset = "utf-8"
        .Open
        .WriteText ConvertToJson(items, Whitespace:=2)
        .SaveToFile myfile, 2
        .Close
    End With

In JsonConverter Module
Private Function json_Encode(ByVal json_Text As Variant) As String

'---- Codes ----

Remove or set to comment code below.
Case 0 To 31, 127 To 65535
' Non-ascii characters -> convert to 4-digit hex
json_Char = "\u" & VBA.Right$("0000" & VBA.Hex$(json_AscCode), 4)

@nextcrom that solution suited me perfectly, thank you!

Unfortunately all of the fixes for UTF-8 support are Windows-only. It's a major issue that I'm looking into, but may not have a good solution for a while.

Spent a good few minutes today debuggin what was producing the strange escaped unicode in our Access workflow. Would be great if you come across a way to solve this in a cross-platform manner. In the meantime will look into the workaround @nextcrom kindly provided (thank you!).

I have added pull request #168 to add an option to preserve the Unicode text instead of escaping it. If accepted, this pull request should close this issue.

so any changes? as always has this problem

also problem what always put \r to all text

Para los tildes solucione de esta forma, los resaltados de amarillo los marque como comentario.
Como indica @nextcrom

Antes \u00C1NCASH
Después ÁNCASH

image

The commit from @joyfullservice is not working, but @breshman is absolutely right!

The commit from @joyfullservice is not working, but @breshman is absolutely right!

That probably depends on the type of output that you want. If you are writing to a file, some programs may need a UTF-8 BOM in the output file to properly render the extended characters.

Worked like a charm for Portuguese- BR ... Tks.