planetscale/database-js

Numbers returned as strings

twiddler opened this issue · 15 comments

In the following test suite, when using PlanetScaleDialect and receiving more than one row, what I expected to be numbers is parsed as strings instead:

import { Kysely, MysqlDialect } from "kysely";
import { PlanetScaleDialect } from "kysely-planetscale";
import { createPool } from "mysql2";
import { strict as assert } from "node:assert";
import { describe, test } from "node:test";
import { fetch } from "undici";

describe("mysql", function () {
  const db = new Kysely({
    dialect: new MysqlDialect({
      pool: createPool({
        database: "mysql",
        host: "localhost",
        user: "root",
        password: "",
      }),
    }),
  });

  test("one row", async function () {
    const q = await db
      .selectFrom([
        db.selectNoFrom((eb) => eb.val(0).as("num_column")).as("foo"),
      ])
      .selectAll()
      .execute();

    assert.deepStrictEqual(q, [{ num_column: 0 }]); // ✅
  });

  test("two rows", async function () {
    const q = await db
      .selectFrom([
        db
          .selectNoFrom((eb) => eb.val(0).as("num_column"))
          .unionAll(db.selectNoFrom((eb) => eb.val(0).as("num_column")))
          .as("foo"),
      ])
      .selectAll()
      .execute();

    assert.deepStrictEqual(q, [{ num_column: 0 }, { num_column: 0 }]); // ✅
  });
});

describe("planetscale", function () {
  const db = new Kysely({
    dialect: new PlanetScaleDialect({
      url: process.env.PLANETSCALE_URL,
      fetch,
    }),
  });

  test("one row", async function () {
    const q = await db
      .selectFrom([
        db.selectNoFrom((eb) => eb.val(0).as("num_column")).as("foo"),
      ])
      .selectAll()
      .execute();

    assert.deepStrictEqual(q, [{ num_column: 0 }]); // ✅
  });

  test("two rows", async function () {
    const q = await db
      .selectFrom([
        db
          .selectNoFrom((eb) => eb.val(0).as("num_column"))
          .unionAll(db.selectNoFrom((eb) => eb.val(0).as("num_column")))
          .as("foo"),
      ])
      .selectAll()
      .execute();

    assert.deepStrictEqual(q, [{ num_column: 0 }, { num_column: 0 }]); // ❌ actual: [ { num_column: '0' }, { num_column: '0' } ]
  });
});

Gist for reproducing: https://gist.github.com/twiddler/8f98b48ec61c75eaa5422249c1edebc9. Please set PLANETSCALE_URL in .env and use Node 21 or later.

Is this expected behavior?

I suspect it's because your schema contains bigints or some other unsigned 64 bit integer which isn't representable in JavaScript.

We have an example specifically for doing your own casting to a JS BigInt type. https://github.com/planetscale/database-js?tab=readme-ov-file#custom-type-casting-function

The proposal to use BigInts directly was rejected. #90

https://github.com/planetscale/database-js?tab=readme-ov-file#custom-type-casting-function

As linked above, you can use a custom casting function for addressing this. Rather than forcing usage of BigInt, you can opt into it.

@mattrobenolt There's no schema involved. I am doing

SELECT * FROM
  (
    SELECT 0 AS "num_column"
    UNION ALL
    SELECT 0 AS "num_column"
  ) AS "foo"

(Sorry if that wasn't clear from the code. I originally posted this on kysely-planetscale.)

What puzzles me is that the type changes depending on how many lines the result set has. It'd be fine if it was always BigInts passed as strings. But if it's sometimes strings, sometimes numbers it makes it really hard to trace errors.

Oh, I did miss that. Lemme see what MySQL is doing here.

Ok, so.

$ curl -s -u $MYSQL_USER:$MYSQL_PWD https://$MYSQL_HOST/psdb.v1alpha1.Database/Execute -H'content-type: application/json' -d'{"query": "SELECT * FROM ( SELECT 0 AS `num_column` ) AS `foo`"}' | jq .result.fields[0]
{
  "name": "num_column",
  "type": "INT32",
  "table": "foo",
  "orgName": "num_column",
  "columnLength": 2,
  "charset": 63,
  "flags": 32769
}
$ curl -s -u $MYSQL_USER:$MYSQL_PWD https://$MYSQL_HOST/psdb.v1alpha1.Database/Execute -H'content-type: application/json' -d'{"query": "SELECT * FROM ( SELECT 0 AS `num_column` UNION ALL SELECT 0 AS `num_column` ) AS `foo`"}' | jq .result.fields[0]
{
  "name": "num_column",
  "type": "INT64",
  "table": "foo",
  "orgName": "num_column",
  "columnLength": 2,
  "charset": 63,
  "flags": 32769
}

When selecting 1 row, the type we get back is INT32, and when 2 rows, it's INT64.

This is... very interesting. I'm sure it's clear to you now at least why we behave this way in the driver, since an INT32 is safely an integer, but INT64, we need to keep it as a string or convert to BigInt, but either way, we're getting a different type.

I'm going to dig in internally, this might be more of a Vitess this? I'm not sure why it'd behave slightly differently.

Just to be explicit, the queries I'm testing are:

SELECT * FROM (
  SELECT 0 AS `num_column`
) AS `foo`

and

SELECT * FROM (
  SELECT 0 AS `num_column`
  UNION ALL
  SELECT 0 AS `num_column`
) AS `foo`

Ok, so this is pretty wild. Against a vanilla mysqld 8.0.34, this is what happens:

mysql> SELECT * FROM ( SELECT 0 AS `num_column` ) AS `foo`;
Field   1:  `num_column`
Catalog:    `def`
Database:   ``
Table:      `foo`
Org_table:  ``
Type:       LONG
Collation:  binary (63)
Length:     2
Max_length: 1
Decimals:   0
Flags:      NOT_NULL NUM


+------------+
| num_column |
+------------+
|          0 |
+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM ( SELECT 0 AS `num_column` UNION ALL SELECT 0 AS `num_column` ) AS `foo`;
Field   1:  `num_column`
Catalog:    `def`
Database:   ``
Table:      `foo`
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     2
Max_length: 1
Decimals:   0
Flags:      NOT_NULL NUM


+------------+
| num_column |
+------------+
|          0 |
|          0 |
+------------+
2 rows in set (0.01 sec)

I can't explain this behavior, but clearly MySQL wants to make the second query LONGLONG and the first just a LONG. Which is 64bit vs 32bit integer.

So both PlanetScale and database-js are handling this query correctly to how MySQL behaves.

I think in this case, the query is just too ambiguous and might benefit from a specific CAST() to yield the correct type, but I sorta suspect this query is just trying to exercise some behavior since this isn't something I think you'd do in practice?

But whatever is happening, we are handling this correctly according to how MySQL behaves.

If anything, the best we'd be able to do here is handle this type as a BigInt with a custom cast function, but then they'd still be different types. One as a normal JS number and other as a BigInt.

this isn't something I think you'd do in practice?

That rather synthetic example is the essence of the following use case:

I am left joining two tables A and B. Since it's a left join, there might be rows in the result set that have NULL values for B.foo. For those rows, I want to know whether B.foo is NULL because there was 1) no match, or 2) B.foo is actually NULL. I solved it like this:

SELECT
  B.foo,
  B.aId IS NOT NULL as matchFound
FROM
  A LEFT JOIN B
ON
  A.id = B.aId;

The IS NOT NULL evaluates to either 0 or 1, which--contrary to what I expected but just like the query in my first post--gets handed back as a number if there's just one row, and strings when there's more rows.

I think PlanetScale should not do any extra lifting here, but just behave like vanilla MySQL. It didn't for me in my example. I tested against community-mysql-server version 8.0.36 (see https://docs.fedoraproject.org/en-US/quick-docs/installing-mysql-mariadb/).

This example is now different though than what you first produced. The first examples yield a "derived table" which is what's behaving oddly.

Can you give me a schema that you're working with for this latter example actually doing a LEFT JOIN? I'm curious to see the data types Vitess is returning vs MySQL. It might be that we're doing something wrong here.

I tried to reproduce with the following:

CREATE TABLE A (
    id BIGINT UNSIGNED AUTO_INCREMENT,
    PRIMARY KEY(id)
);

CREATE TABLE B (
    id BIGINT UNSIGNED AUTO_INCREMENT,
    aId BIGINT UNSIGNED,
    foo TEXT,
    PRIMARY KEY(id),
    KEY(aId)
);

INSERT INTO A VALUES (NULL);
INSERT INTO A VALUES (NULL);
INSERT INTO B VALUES (NULL, 1, 'hello');

SELECT
    B.foo,
    B.aId IS NOT NULL AS matchFound
FROM
    A LEFT JOIN B
ON
    A.id = B.aId;

This setup seems to mirror what you're trying to accomplish.

Against PlanetScale, I get:

mysql> SELECT
    ->     B.foo,
    ->     B.aId IS NOT NULL AS matchFound
    -> FROM
    ->     A LEFT JOIN B
    -> ON
    ->     A.id = B.aId;
Field   1:  `foo`
Catalog:    `def`
Database:   `mattdb`
Table:      `B`
Org_table:  `B`
Type:       BLOB
Collation:  utf8mb4_0900_ai_ci (255)
Length:     262140
Max_length: 5
Decimals:   0
Flags:      BLOB

Field   2:  `matchFound`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     1
Max_length: 1
Decimals:   0
Flags:      NOT_NULL BINARY NUM

And against vanilla MySQL 8.0.34:

mysql> SELECT
    ->     B.foo,
    ->     B.aId IS NOT NULL AS matchFound
    -> FROM
    ->     A LEFT JOIN B
    -> ON
    ->     A.id = B.aId;
Field   1:  `foo`
Catalog:    `def`
Database:   `mysql`
Table:      `B`
Org_table:  `B`
Type:       BLOB
Collation:  utf8mb4_general_ci (45)
Length:     262140
Max_length: 5
Decimals:   0
Flags:      BLOB

Field   2:  `matchFound`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     1
Max_length: 1
Decimals:   0
Flags:      NOT_NULL BINARY NUM


+-------+------------+
| foo   | matchFound |
+-------+------------+
| hello |          1 |
| NULL  |          0 |
+-------+------------+
2 rows in set (0.00 sec)

If I delete a row, so there's only 1:

--- PlanetScale
mysql> delete from A where id=2;
Query OK, 1 row affected (0.11 sec)

mysql> SELECT
    ->     B.foo,
    ->     B.aId IS NOT NULL AS matchFound
    -> FROM
    ->     A LEFT JOIN B
    -> ON
    ->     A.id = B.aId;
Field   1:  `foo`
Catalog:    `def`
Database:   `mattdb`
Table:      `B`
Org_table:  `B`
Type:       BLOB
Collation:  utf8mb4_0900_ai_ci (255)
Length:     262140
Max_length: 5
Decimals:   0
Flags:      BLOB

Field   2:  `matchFound`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     1
Max_length: 1
Decimals:   0
Flags:      NOT_NULL BINARY NUM


+-------+------------+
| foo   | matchFound |
+-------+------------+
| hello |          1 |
+-------+------------+
1 row in set (0.10 sec)
--- MySQL 8.0.34
mysql> delete from A where id=2;
Query OK, 1 row affected (0.02 sec)

mysql> SELECT
    ->     B.foo,
    ->     B.aId IS NOT NULL AS matchFound
    -> FROM
    ->     A LEFT JOIN B
    -> ON
    ->     A.id = B.aId;
Field   1:  `foo`
Catalog:    `def`
Database:   `mysql`
Table:      `B`
Org_table:  `B`
Type:       BLOB
Collation:  utf8mb4_general_ci (45)
Length:     262140
Max_length: 5
Decimals:   0
Flags:      BLOB

Field   2:  `matchFound`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     1
Max_length: 1
Decimals:   0
Flags:      NOT_NULL BINARY NUM


+-------+------------+
| foo   | matchFound |
+-------+------------+
| hello |          1 |
+-------+------------+
1 row in set (0.00 sec)

In all cases, matchFound is a LONGLONG which is a 64 bit integer.

Am I missing something here? This would yield a string in all cases through our driver.

@twiddler one thing to watch for: I think the mysql2 client is going to type-cast values as either numbers or strings based on the length of the string returned, basically if the string value of the number is short enough, it'll automatically parse it into a number:

https://github.com/sidorares/node-mysql2/blob/225a2bdb6e16ee201121d4a8d36508a792ff6e54/lib/packets/packet.js#L433-L487

There's some pool options that control that behavior (e.g. setting bigNumberStrings: true will mean that the LONGLONG always gets returned as a string), but that might be the source of the difference here.

Yeah, I had considered something like this where we convert to an integer if we know it's within the safe bounds, but hard to say objectively what's better, because then you just get a string at some point when you've crossed the threshold and your code doesn't anticipate that.

I think in this case tho, the answer is probably supplying a custom cast function that handles this how you like. But from my understanding of everything here, the types are the same on the protocol level. We just choose to handle LONGLONG types a strings and hands-off on guessing what to do with it, while mysql2 has knobs to tune the behavior.

I think this is my complete understanding at this point.

Thanks for all your hard work, and sorry for getting back so late. 🙏

the mysql2 client is going to type-cast values as either numbers or strings based on the length of the string returned

That would have explained the behaviour, but even with

      pool: createPool({
        database: "mysql",
        host: "localhost",
        user: "root",
        password: "",
+       bigNumberStrings: true,
      }),

the tests in my opening post still pass:

▶ mysql
  ✔ one row (30.003698ms)
  ✔ two rows (1.971407ms)
▶ mysql (32.916212ms)

Expecting strings fails:

-   assert.deepStrictEqual(q, [{ num_column: 0 }, { num_column: 0 }]);
+   assert.deepStrictEqual(q, [{ num_column: "0" }, { num_column: "0" }]);
▶ mysql
  ✔ one row (33.345367ms)
  ✖ two rows (5.137187ms)
    AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:

Can you give me a schema that you're working with for this latter example actually doing a LEFT JOIN?

CREATE TABLE `A` (
  `id` varchar(191) NOT NULL,
  PRIMARY KEY (`id`),
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;

CREATE TABLE `B` (
  `id` varchar(191) NOT NULL,
  `aId` varchar(191) NOT NULL,
  `foo` enum('inherit', 'true', 'false') NOT NULL DEFAULT 'inherit',
  PRIMARY KEY (`id`, `aId`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;

And the query (for convenience, same as above):

SELECT
  B.foo,
  B.aId IS NOT NULL as matchFound
FROM
  A LEFT JOIN B
ON
  A.id = B.aId;

So while I am still confused why mysql2 didn't behave as expected when bigNumberStrings: true, I am sure that when it comes to PlanetScale I can solve my use case with a custom cast function. (In my application, I see no reason to just Number(...) any numeric value and then assert it is <= Number.MAX_SAFE_INTEGER--which is ridiculously huge anyway.)

I can't figure out why MySQL treats IS NOT NULL as an INT64, but that question is certainly out of scope here. CAST does not accept any smaller number type, so I'll select B.aId and z.transform null to false and everything else to true.

Thanks for all your input. This was definitely an interesting ride. 🎢

Yeah, I can't explain why MySQL behaves that way either. Or why mysql2 infers this. But they definitely do extra work to try and coerce numbers. Maybe it's something we can explore too? But that seems like it'd have unexpected behaviors too. When you do cross the boundary, now you get a string.

I can easily see the case where you have a bigint primary key, everything is working for a long time. Eventually you cross the threshold and now you get strings back and code is broken. I'm not sure that'd be a good experience either.

I'd personally recommend, like you suggested, a custom cast function or something for you to explicitly handle this how you want to.

I think we as a library are handling this as correctly as we can in JavaScript until we have some evidence otherwise.