planetscale/database-js

Simple `show tables` statement not cast correctly by default

cdcarson opened this issue · 11 comments

I believe this is related to PR #164.

Given this...

import { DATABASE_URL } from '$env/static/private';
import { connect } from '@planetscale/database';
export const load = async () => {
  const conn = connect({ url: DATABASE_URL });
  const result = await conn.execute('SHOW TABLES;');
  console.dir(result, { depth: null });
  return { result: 'broken' };
};

... I get this....

{
  headers: [ 'Tables_in_mydbname' ],
  types: { Tables_in_mydbname: 'VARBINARY' },
  fields: [
    {
      name: 'Tables_in_mydbname',
      type: 'VARBINARY',
      table: 'TABLES',
      orgTable: 'tables',
      orgName: 'Tables_in_mydbname',
      columnLength: 256,
      charset: 255,
      flags: 4225
    }
  ],
  rows: [
    {
      Tables_in_mydbname: Uint8Array(7) [
         65, 114, 116,
        105,  99, 108,
        101
      ]
    },
  // etc.
  ]
}

Doing this...

import { DATABASE_URL } from '$env/static/private';
import { connect, cast } from '@planetscale/database';
export const load = async () => {
  const conn = connect({ url: DATABASE_URL, cast: (field, value) => {
    if (field.type === 'VARBINARY' && (value as unknown) instanceof Uint8Array) {
      return new TextDecoder().decode(value as unknown as Uint8Array);
    }
    return cast(field, value);
  }});
  const result = await conn.execute('SHOW TABLES;');
  console.dir(result, { depth: null });
  return { result: 'not broken' };
};

...works, but (a) it's mighty inconvenient typescript (as unknown etc,) (b) it seems like it'd likely produce similar unwanted behavior in other off-the-cuff queries, and (c) it at least requires the shape of cast to be changed to string|null|Uint8Array.

Not sure if this is possible, but it'd be simpler to go back to string|null, and leave it up to userland or ORMs (Prisma filed the original issue) to deal with. I'm aware that it might be more complicated than this, but one can turn a string into a UintArray ...

const arr = new TextEncoder().encode('Hello, I am a string from planetscale');

Thanks!

The problem seems two-fold:

SHOW TABLES; is returning a Uint8Array

Sometimes the default casting doesn't make any sense for a given query. My first thought is that SHOW TABLES; is one of these edge cases where these defaults don't make any sense. Maybe we'll want to have the possibility to opt-out of the default casting and always return a string | null:

conn.execute('SHOW TABLES;', null, { cast: null });
console.dir(result, { depth: null });
{
  headers: [ 'Tables_in_blob' ],
  types: { Tables_in_blob: 'VARBINARY' },
  fields: [
    {
      name: 'Tables_in_blob',
      type: 'VARBINARY',
      table: 'TABLES',
      orgTable: 'tables',
      orgName: 'Tables_in_blob',
      columnLength: 256,
      charset: 255,
      flags: 4225
    }
  ],
  rows: [ { Tables_in_blob: 'binary_test' } ],
  rowsAffected: 0,
  insertId: '0',
  size: 1,
  statement: 'SHOW TABLES;',
  time: 2.002029
}

cast return type is any

For the return type of cast we're more than happy to start typing this more strictly, we simply have not gotten to it. We'll want to strong type Field as well #141. Once we have we prob can type the return value of cast like: if field type is 'INT8' then the return value is a number, if field type is 'BLOB' then the return value is a Uint8Array

Opened an exploratory PR #166 to address the first issue.

For the second issue I believe you already can start doing:

import { connect, cast, type Cast } from '@planetscale/database';

const inflate: Cast = (field, value) => {
  if (field.type === 'VARBINARY' && value instanceof Uint8Array) {
    return new TextDecoder().decode(value as unknown as Uint8Array);
  }
  return cast(field, value);
}

const conn = connect({ url: DATABASE_URL, cast: inflate });

I bet this can be done a bit more intelligently on BINARY/VARBINARY specifically by also starting to check in the charset and flags. I assume those give a hint if it's ASCII or what.

Thanks for reporting the bug @cdcarson, we were able to use the flags like @mattrobenolt said, and this should be fixed in v1.16.0. Give it a try and let us know if you run into new issues.

@ayrton @mattrobenolt Thanks for the quick response and your work.

Unfortunately, I found at least one other query where things that should be string are arrays. (At least IMO.) True, it's another "schema" query, like SHOW TABLES above, but it's nevertheless worrying.

I have a theory about #161 and #164, possibly making all this go away. (Also possibly entirely wrong.) I'll get back to you on that.

Here's the offending case.

-- table def shouldn't matter, but for completeness...
CREATE TABLE `User` ( 
 `id` bigint unsigned NOT NULL AUTO_INCREMENT, 
 `createdAt` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3), 
  PRIMARY KEY (`id`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


SHOW FULL COLUMNS FROM User;

This produces arrays for Type, Default and Comment ....

{
 
  fields: [
    {
      name: 'Field',
      type: 'VARCHAR',
      table: 'COLUMNS',
      orgName: 'Field',
      columnLength: 256,
      charset: 255
    },
    {
      name: 'Type',
      type: 'BLOB',
      table: 'COLUMNS',
      orgTable: 'columns',
      orgName: 'Type',
      columnLength: 67108860,
      charset: 255,
      flags: 4241
    },
    {
      name: 'Collation',
      type: 'VARCHAR',
      table: 'COLUMNS',
      orgName: 'Collation',
      columnLength: 256,
      charset: 255
    },
    {
      name: 'Null',
      type: 'VARCHAR',
      table: 'COLUMNS',
      orgName: 'Null',
      columnLength: 12,
      charset: 255,
      flags: 1
    },
    {
      name: 'Key',
      type: 'BINARY',
      table: 'COLUMNS',
      orgTable: 'columns',
      orgName: 'Key',
      columnLength: 12,
      charset: 255,
      flags: 4481
    },
    {
      name: 'Default',
      type: 'BLOB',
      table: 'COLUMNS',
      orgTable: 'columns',
      orgName: 'Default',
      columnLength: 262140,
      charset: 255,
      flags: 144
    },
    {
      name: 'Extra',
      type: 'VARCHAR',
      table: 'COLUMNS',
      orgName: 'Extra',
      columnLength: 1024,
      charset: 255
    },
    {
      name: 'Privileges',
      type: 'VARCHAR',
      table: 'COLUMNS',
      orgName: 'Privileges',
      columnLength: 616,
      charset: 255
    },
    {
      name: 'Comment',
      type: 'BLOB',
      table: 'COLUMNS',
      orgName: 'Comment',
      columnLength: 24576,
      charset: 255,
      flags: 145
    }
  ],
  rows: [
    {
      Field: 'id',
      Type: [Uint8Array],
      Collation: null,
      Null: 'NO',
      Key: 'PRI',
      Default: null,
      Extra: 'auto_increment',
      Privileges: 'select,insert,update,references',
      Comment: Uint8Array(0) []
    },
    {
      Field: 'createdAt',
      Type: [Uint8Array],
      Collation: null,
      Null: 'NO',
      Key: '',
      Default: [Uint8Array],
      Extra: 'DEFAULT_GENERATED',
      Privileges: 'select,insert,update,references',
      Comment: Uint8Array(0) []
    }
  ],
}

This specific case may be a simple fix, just put 'BLOB' through your isText check like 'VARBINARY' and 'BINARY'.

database-js/src/index.ts

Lines 408 to 414 in 0152910

case 'BLOB':
case 'BIT':
case 'GEOMETRY':
return uint8Array(value)
case 'BINARY':
case 'VARBINARY':
return isText(field) ? value : uint8Array(value)

Thanks again!

That does appear true. I thought BLOB was reserved for exclusively binary only data to not even be attempted to be interpreted as text, but I guess I was wrong. This seems reasonable to me to trust the flags as authority. This whole chunk should probably be redone a bit in light of all this.

What's interesting is that the BLOB also contains a charset. I definitely thought they explicitly did not. I wonder if this is potentially a deviation of what Vitess does. In any case, I think it's fair to just look at the flags.

My "theory" is that the library doesn't really need to convert to an array. It's real simple to convert a buffer (say read from a file) to an encoded string, rather than passing the buffer. There may be downsides, but I think it's reasonable.

Working example repo using 1.14 of this lib and base64 encoding:

https://github.com/cdcarson/pscale-text-encode-reproduction/blob/main/src/case-1.ts

What's interesting is that the BLOB also contains a charset.

SHOW CHARACTER SET LIKE 'binary'

I think this is the default 'binary' pseudo character set. I don't think you can set it on a column.

@cdcarson thank you, we'll have a closer look