sidorares/node-mysql2

RFC: deprecate typeCast and introduce resultsMapper ( exact name TBD )

sidorares opened this issue ยท 14 comments

There are few issues with typeCast currently, and most of them better solved by moving it to a different level where it sits in the driver, I think it applies both to mysql2 / mysqljs/mysql

Problems I'd like to solve with the change:

  1. most important - typeCast sits too low and is too powerful. It's super easy to make connection go into completely undefined / unpredictable state by reading incorrect number of bytes
  2. performance: mysql2 generates row parser JS code in the run time and just the fact that you have custom typeCast function makes generated parser slower and more likely to bail out from optimisations
  3. difficult to implement for execute(). typeCast in its current form expects user to know binary serialization format of results and data for execute() if often encoded differently from data for query()
  4. one can't join 2 rows into one etc as typeCast is called before every column of a row is read

Most importantly: almost every use case I see is "map rows from default types into something that works best for our use case" and not "I wan't to change / extend parsing logic".

Proposed solution

Introduce resultsMapper config function (happy to hear better name suggestions). If set, it's applied to every row and result of the function is then used as returned row.

signature:

function resultsMapper(row: Row, fields: Array<Column>) => CustomRow

Deprecation strategy

  • soft deprecate typeCast - note in the documentation
  • next step - decide if we want to remove typeCast completely - any use cases that can't be efficiently solved without having typeCast. If nothing, add deprecation message in the code
  • finally, leave a stub throwing an error and pointing to documentation/migration guide

CC ( not sure if I'm tagging correct people, if you know who is better suited to comment please let me know )
sequelize - @sushantdhiman , @papb
knex - @tgriesser , @kibertoad
mysqljs/mysql @dougwilson

Although I am not currently maintaining Sequelize, I think apart from some refactor work in Sequelize this new implementation can be adopted without much effort, given that Column exposes underlying data type (string or symbol).

Note : (for sequelize team) this line will be changed to this new API. One issue could be custom buffer parsing for Geometry type

cc @sdepold , @fncolon current maintainers of sequelize

testn commented

Also please think of how we can make the roundtrip of the parameter as well. Let's say we use tinybit to represent a boolean value, how can we take a boolean value to insert it back into the database.

Also please think of how we can make the roundtrip of the parameter as well. Let's say we use tinybit to represent a boolean value, how can we take a boolean value to insert it back into the database.

I didn't think about that being a symmetric part of typeCast / resultsMapper but now start to think maybe it's a good idea? parametersMapping ({ value, parameterTypeHint, connection }) => { value, mysqlType }? ( example possible default parameter mapper function -

function toParameter(value, encoding, timezone, parameterHints) {
)

testn commented

If you want to do something like that, I think it will be much more performant if we allow the client to map only specific type. Unmapped types would be serialized/deserialized with the default logic without calling into the custom code every time.

Hello ๐Ÿ‘‹.

Is this still being pursued? We're grappling with normalizing the results of our API to all boolean literals instead of 1/0 and would really appreciate this for handling the results of our prepared statements for TINYINT(1) columns.

@Parsonswy what's currently stopping you from using typeCast? Not having in in .execute() results?

This is still being pursued though at a relatively low priority, happy to put a bit more attention if there is enough interest

@Parsonswy what's currently stopping you from using typeCast? Not having in in .execute() results?

This is still being pursued though at a relatively low priority, happy to put a bit more attention if there is enough interest

Thank you for the quick response. Yes, .execute()/binary protocol. I was under the impression there isn't a way to use typeCast with .execute()/prepared statements.

I was under the impression there isn't a way to use typeCast with .execute()/prepared statements.

correct @Parsonswy , there isn't currently. The way typecast works is not 100% compatible between protocols if we just add it to execute.

typeCast callbacks are allowed to call .string() / .buffer() and .geometry() helper on the unserialised field object. With text protocol most types are sent as strings ( dates, numeric, string, boolean ) where .execute() results serialise them using corresponding binary representations ( IEEE 754 for the double / single precision floats, custom encoding for date etc ). Worse, if you call incorrect deserieliser you'll end up in the invalid state where read position could be in the middle of the next field or past end of packet. This is the main reason I want to sunset current typeCast api

Ope possible option is to have .value() helper that reads using built in (correct) deserialiser, and then your custom code maps to anything you want

That makes sense @sidorares.

A general .value() call seems sensible to me and supports our specific use case. Is this change more attainable/attractive now compared to adding something like the resultsMapper API? I am not sure how silly it feels to add .value() knowing that removing typeCast() is backlogged.

Could .value() introduce a bc break too? I think you were implying that adding .value() would mean typeCast would be invoked when handling binary results. Currently it is valid and normal call .string() on a "numeric" result cell within a textual result set, but when dealing with binary results that call could (would?) be problematic.

I'd be willing to help with an implementation for either of these approaches after some of these details are ironed out and adoption green lit. I am not sure what the support threshold is for adopting these changes.

I am not sure how silly it feels to add .value() knowing that removing typeCast() is backlogged.

If we go that route we'll probably keep typeCast and only deprecate .string() / .number() etc methods
I think .value() ticks issue 1 and 3. not sure how important 2 and 4 are ( would be good to be able to control behaviour like rowsAsArray, nestTables ). Example scenario: I'm getting x, y float rows but want to have a single Point object instead under some different name.

Hi @Parsonswy, it's been a while, but I'd really appreciate it if you could check out PR #2398.
I'd like to hear positive or negative feedbacks too ๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ

As I said there, I believe that as long as typeCast is still used, it's really useful that it works for both methods.

Hi @Parsonswy, it's been a while, but I'd really appreciate it if you could check out PR #2398. I'd like to hear positive or negative feedbacks too ๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ

As I said there, I believe that as long as typeCast is still used, it's really useful that it works for both methods.

Sorry I did not see this back in January, @wellwelwel.

Now that typeCast is invoked on execute results, existing typecast implementations can break in subtle ways. For example, my company had been using typeCast to handle TINYINT(1) => boolean conversion, but this hook now breaks on binary results.

typeCast: (field, next) => {
	if (field.type === 'TINY' && field.length === 1) {
		return field.string() === '1';
	}
	return next();
}

readLengthCodedString() will trip on any boolean columns in the packet and mangle results or causes any number of strange connection / out of bound read states.

It seems like the general conscientious is still to deprecate typecast because these sorts of issues are inherent when this level of control is available, but as it stands this is sort of a bc break since the API behavior changed. I don't think it is possible for the hook to know if it is operating on binary or text results and choose to skip or change read* methods. Even if it was, all the existing .read() aliases on the field wrapper are for length-coded types so there isn't a way to parse fixed-width fields like TINYINT.

I think the .value() approach that @sidorares mentioned about could solve this issue, along with deprecating the current wrapper methods to reduce was for the caller to brick the connection. I've prepared a PR that implements this approach for discussion and to unblock at least my use case immediately.

What if we have a bit higher and call reduceRows:

  • receives array of current columns + array of column definitions + result of previous reduceRows call
  • whatever is returned becomes result of .query() / execute()

This api would allow flexibility to configure base object for a row ( #2090 (comment) @nachogiljaldo ). It also makes possible to combine multiple column into single mapped object ( Point from a pair or doubles etc ). It also allows to emulate nestTables behaviour, as well as things like for example "skip consecutive identical rows"