planetscale/database-js

Retrieved binary is converted to a string

AlexErrant opened this issue · 6 comments

AFAICT, this library does not support retrieving binary as binary.

I added some code inspired by custom type casting function:

function inflate(field, value) {
  console.log("field.type", field.type)
  console.log("value", value)
  console.log("value type", typeof value)
  return cast(field, value)
}

When I do so, here is some representative output of some UUIDs:

field.type BINARY
value ø0sô¼A¤ÿçÐsô
value type string
field.type BINARY
value AD3ADFB038724973
value type string
field.type BINARY
value type string

Yes, the third value is missing - it is seemingly unprintable and silently killed that console.log. This isn't surprising because not all binary is valid utf-8.

Some code, if you want gory details.
create table Media_User
(
    mediaId binary(16)  not null,
    userId  varchar(21) not null,
    primary key (mediaId, userId)
)
    collate = utf8mb4_unicode_ci;

INSERT INTO Media_User (mediaId, userId) VALUES (0x11f830737ff4bc41a4ffe792d073f41f, 'ff')
INSERT INTO Media_User (mediaId, userId) VALUES (0x41443341444642303338373234393733, 'ff')
INSERT INTO Media_User (mediaId, userId) VALUES (0x737295711B1121479411906BBD2F7A34, 'ff')
    import { connect, cast } from "@planetscale/database"

    const config = {
      cast: inflate,
      url: somethngSecret,
    }

    const conn = connect(config)
    const results = await conn.execute("select mediaId from planetScaleDb.Media_User")
    console.log("------------------")
    console.log(results.rows[0])
    console.log(results.rows[1])
    console.log(results.rows[2])

This occurs for me on cloudflare workers (and miniflare).

One good night of sleep later...

Okay I might just be dumb. How the heck are you supposed to mix binary with other content over the wire? (Protobuf?) Nah, the simplest thing would be to just make everything a string, i.e. Base64 it.

select TO_BASE64(mediaId) from planetScaleDb.Media_User

Leaving this open in case there's a better solution/idea.

P.S. aaaaand an insert statement, just to make the roundtrip explicit/easy for anyone searching:

    await conn.execute(
      "INSERT INTO Media_User (mediaId, userId) VALUES (FROM_BASE64(?), ?)",
      ["QUQzQURGQjAzODcyNDk3Mw==", "someUserId"]
    )

Thanks for the follow-up on this! With this, it's a bit tricky since we don't know what the binary value is representing, so it's up to the caller to handle that. I wonder if it makes sense to use a Buffer in this case for sending binary data over the wire? I'd be interested in seeing if Cloudflare supports that and if it works with the API to avoid doing the FROM_BASE64 call.

Cloudflare doesn't support Buffer - it'd need to be a Uint8Array or something. There's also streams, but that's overkill for my binary(16) type. Also not sure if this project (or Vitess) supports streaming responses.

The return value for binary columns is a string containing the bytes from the row. For a web platform environment, like browsers and Cloudflare workers, you can convert it to Uint8Array like this.

function toUint8Array(bytes) {
  return Uint8Array.from(bytes.split('').map(b => b.charCodeAt(0)))
}

function inflate(field, value) {
  switch(field.type) {
    case 'BLOB':
    case 'BIT':
    case 'VARBINARY':
    case 'BINARY':
      return toUint8Array(value)
    default:
      return cast(field, value)
  }
}

Since this library is used in several different JS runtime environments, we just pass the binary string through to the caller rather than wrapping it in a container like Node's Buffer or Uint8Array.

Oh - this is standard MySql.

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they store binary strings rather than nonbinary strings. That is, they store byte strings rather than character strings.

So MySql/vitess/planetscale/database-js returns a string of bytes, which is interpreted as a string of characters. Simply convert the characters to bytes, and bob's your uncle.

Some more notes for posterity:

We can't reverse the process to insert a byte string into the database. You can use String.fromCharCode to generate a byte string, but you'll run into encoding issues if you try to insert it. You'll still need to insert using from_base64 or unhex or UUID_TO_BIN.

Uint8Array.from(bytes.split('').map(b => b.charCodeAt(0))) seems the best way to decode the character string. Don't be tempted to use TextDecoder, which "fails for non printable ASCII chars".

Interestingly...

ECMAScript does not place any restrictions or requirements on the values [of strings] except that they must be 16-bit unsigned integers. Src.

...so you can get strings in Javascript that don't behave like strings. So yeah, "not all binary is valid utf-8", but that doesn't matter in this case since we're not treating it like a normal string. Also, Javascript is UTF-16.

Thanks dgraham!