endocrimes/Jay

UTF-16 unsupported

remko opened this issue · 4 comments

remko commented

When calling jsonFromData with an array of UTF-16-encoded bytes, Jay throws unimplemented.

The RFC recommends to determine the encoding by looking at the initial bytes (section 3).

I don't really want UTF-16, but if the API offers (only) a raw bytes interface, I would expect it to take any encoding like JSONSerialization does.

Good point, first step would be to specify in inline documentation that everything is UTF-8.

Regarding supporting UTF-16, I don't see the point, really. I saw some stats that a huge majority of all Internet comms is UTF-8, so I think that's the thing to focus on with limited resources.

But a nice PR would be accepted, I suspect (@dantoml?)

remko commented

It's probably a relatively easy fix, but I'm not really interested in UTF-16 either, UTF-8 everywhere and all. But I was wondering what the point was of a raw data API if all you take is strings in a specific encoding anyway; might as well just take Strings instead, and not leave room for any error thanks to the type system? (if all the user has is Data/[UInt8], at least they know they have to think about what encoding it is in if the compiler tells you it needs a string).

Ah I see, yeah that makes sense. I'd lean towards 1) documenting we only take UTF-8 and maybe 2) detect the first bytes while parsing and throw a specific error that says it's an incompatible encoding and that we need UTF-8. That should cover the majority of failing cases.

Yeah, I'd go for initially documenting as UTF8 only, and throwing a better error. Then adding actual support later.