nitishm/go-rejson

Invalid data is read in JSONGet when a struct with Unicode strings is passed into JSONSet.

Closed this issue · 1 comments

Describe the bug
JSONGet does not produce the correct output for Cyrillic text.

To Reproduce
Steps to reproduce the behavior:

  1. Use JSONSet to write a structure with string fields. Fill the fields with Unicode characters.
  2. Use res, err := JSONGet() to read the structure.
  3. Convert res into []byte either manually (res.[]byte), or using redigo.Bytes(res).
  4. Unmarshall the []byte result via json.Unmarshall() into the structure.
  5. Compare values you have written with values json.Unmarshal produced from JSONGet result.
  6. New structure will contain fields with different (seemingly random) characters.

Expected behavior
Fields in first structure (which we have written) and the second one (which was read) should match.

Additional context
The problem I found lies within rjs.StringToBytes function, which is called from JSONGet. There are the following lines (_lst is a string, by is []byte) :

for _, s := range _lst {
    by = append(by, byte(s))
}

Here, s is a rune, which is an alias for int32. When we convert it into byte, we loose all but the least significant byte. Fix is pretty straightforward, we just need to convert string into []byte directly, without looping over each rune:

by = []byte(_lst)

I've copied JSONGet in my own code and applied this fix, and my Unicode problem was solved.

I was having problems with Brazilian Portuguese accentuation and it worked!