kriszyp/cbor-x

Behavior of tagUint8Array

Closed this issue · 3 comments

Hi there. I have a client-server application that makes heavy use of binary data (many small buffers, sometimes really big ones) as Uint8Array. While debugging some other issues I noticed that iftagUint8Array is not configured in the encoder options the default behavior is that cbor-x on node encodes the tag 64 but not on the client/frontend

getTag(typedArray) {
if (typedArray.constructor === Uint8Array) {
	if (this.tagUint8Array || hasNodeBuffer && this.tagUint8Array !== false)
		return 64;
} // else no tag

I am not 100% sure why tag 64 is defaulted when hasNodeBuffer is set - but I guess it has something to do with the fact it otherwise would decode into a node Buffer, which pointed me to the interesting fact that with tagUint8Array = false the source buffer is used to create the views for the decoded array buffer (without copy - assuming copyBuffers = false here), while with tagUint8Array = true(and the default behavior in node as per above) a new copy of the data is always created by the extension code. For my use case it would therefore be highly desirable to use tagUint8Array = false to avoid copying data needlessly (the source buffer is almost always from an HHTP request or a transient object), but on the other hand having Buffer around instead of Uint8Array has caused me some troubles in the past.

So the options I see are:

  • use { tagUint8Array : false } on both server and client (where it is already the default behavior). The PRO is that buffer data is not copied over - as it's just a view over the source, and the CBOR encoded by both client and server is the same. The CON is that decoded Uint8Array will most likely be a buffer on Node. It also feels not so spec-compliant.
  • use { tagUint8Array : true } on both server (where it is already the default behavior) and client. The PRO is that now decoded byte arrays are not instances of Buffer, and the CBOR encoded by both client and server is the same. This also seem more robust and spec-compliant. The CON is that data is always duplicated when decoded, which is really a waste in my case

Questions:

  • Do you see any issues in using { tagUint8Array : false }across the board besides Uint8Array getting encoded as Buffer in node?
  • Is there a way to override the Uint8Array extension so that 1) the tag 64 is always encoded and 2) decoding does NOT copy the data over but returns Uint8Array as a view of the source (this would be the best of the 2 worlds for me)?
  • Wouldn't make sense to introduce a configuration that would not use Buffer on node at all (I hacked something but that didn't work for some reason - so maybe that's not really an option for now?)?

Many thanks for your help and for this great library

I think it should be fine to tagUint8Array : false, and I believe if you pass in a Uint8Array to decode, it should return Uint8Arrays for binary data values.
I have also made a commit to try to avoid copying data for typed arrays based on data alignment (so with copyBuffers: false, Uint8Array should never need to copy).

I think it should be fine to tagUint8Array : false

Awesome thank you

I have also made a commit to try to avoid copying data for typed arrays based on data alignment (so with copyBuffers: false, Uint8Array should never need to copy)

You read my mind - that's actually what I was about to propose.

Thank you again

closed