Support for cbor-x on plain v8 javascript engine

Question

Support for cbor-x on plain v8 javascript engine

shubhamvrkr opened this issue 2 years ago · 18 comments

I am trying to use this package on env other than node and browser which doesn't have global or any browser/node based functions.
Executing on V8 gives undefined error for the global object declared here https://github.com/kriszyp/cbor-x/blob/master/decode.js#L825

It looks like the glbl object can be made as a map that maps => native object function. Can we do something like this or am i missing something?

let glbl = {
'Uint8Array': Uint8Array,
'Uint8ClampedArray': Uint8ClampedArray,
'Uint16Array': Uint16Array,
'Uint32Array': Uint32Array,
'BigUint64Array': BigUint64Array,
'Int8Array': Int8Array,
'Int16Array': Int16Array,
'Int32Array': Int32Array,
'BigInt64Array': BigInt64Array,
'Float32Array': Float32Array,
'Float64Array': Float64Array,
'RegExp': RegExp
};

Answer 1 · 2022-08-20T21:16:56.000Z

I suppose we could do something like that. There is a little more to it; BigInt isn't available on all supported platforms, and we also had been using glbl.RegExp, glbl.Error, and glbl.Set. I'll see what I can do though.

Answer 2 · 2022-08-21T03:44:55.000Z

@kriszyp sure @kriszyp , Thanks!! Kindly update any progress, kinda urgent though!

Answer 3 · 2022-08-21T05:54:25.000Z

Hi @kriszyp , i am facing an issue while encoding the content. Looks like this is a bug. With cbor-web lib it works but cant use cbor-web in our env. I am using 1.3.3 of cbox-x;

Heres a code snippet

const p = hexStringToUint8Array("a10105")
const payload = hexStringToUint8Array("a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71");

const MACstructure2 = [
'MAC0', // 'MAC0' or 'MAC1', // context
p.buffer, // protected
new ArrayBuffer(0), // bstr,
payload.buffer
];
const tobeMaced2 = cbor.encode(MACstructure2)
const tobeMaced2CborWeb = cborweb.encode(MACstructure2)

I get different output from the library

tobeMaced2: 84644d41433043006000405850856fbb7f00000800000001000000000000000000000078ca856fbb7f0000000000000000000088ca856fbb7f0000000000000000000098ca856fbb7f00008804a06dbb7f0000ffffffffffffffff0000
tobeMaced2CborWeb: 84644d41433043a10105405850a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71

Looks like cbor-x is not able to read ArrayBuffer content. However if i pass Uint8Array instead of Arraybuffer i get same output from both lib but the output has changed from original one.

const MACstructure2 = [
'MAC0', // 'MAC0' or 'MAC1', // context
p, // protected
new Uint8Array(0), // bstr,
payload
];

tobeMaced2: 84644d414330d84043a10105d84040d8405850a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71
tobeMaced2CborWeb: 84644d414330d84043a10105d84040d8405850a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71

This becomes an issue when the encoding was done in nodejs or any other PL and its be decoded by this library. So i guess cbor-x needs to handle reading from ArrayBuffer. Let me know your thoughts on this.

A more simpler example:

const data = Buffer.from("a10105", "hex");
var a = new Uint8Array(data.length);
for (var i = 0; i < data.length; i++) a[i] = data[i];
encode(data).toString('hex'). //43a10105 => correct
encode(a.buffer).toString('hex') //43600000 => wrong
encode(a).toString('hex') //d84043a10105 => correct for uint8array

Answer 4 · 2022-08-21T08:13:09.000Z

@kriszyp Also if i toggle this const hasNodeBuffer = typeof Buffer === 'undefined', The encoding of Uint8Array does not append d840 tag. which gives inconsistent behaviour. i think when input is given as ArrayBuffer in JS we dont append d840 tag but when its Uint8Array we should add it. Thoughts ?

Answer 5 · 2022-08-21T12:23:29.000Z

Yes, you are right, ArrayBuffers were not being properly serialized. I have committed a fix for this and removed reliance on the global object.

Also if i toggle this const hasNodeBuffer = typeof Buffer === 'undefined', The encoding of Uint8Array does not append d840 tag.

That is because the most common class for holding binary data in NodeJS is the Buffer (returned by most Node APIs that return binary data), whereas on the browser, Uint8Array is the most common class for binary data (Buffer isn't available). So each of these are used the preferred class for holding CBOR's binary data strings.

I could add an option for explicitly turn tagging of Uint8Arrays on or off. Would that help?

Answer 6 · 2022-08-21T13:23:30.000Z

@kriszyp So in browser we could still use Uint8Array as buffer to hold the result of encoding. However incase no buffer ( i.e browser), if the input to the encoder is ArrayBuffer we do not append d840 (i.e Tag 64) but if the input to the encoder is Uint8Array then we add Tag 64 as we do currently in node. Will this be doable ?
This will make the behaviour consistent on browser as well as on node. i.e inputs to encoder as ArrayBuffer are treated as Buffers in node and input as Uint8Array are treated as it is. (i.e same as node that adds Tag64). Thoughts?
Reason:
Imaging a CWT token is generated on node server and needs to be validated on any other v8 env that done have buffer. If on node the encoding was on uint8array as input, this would give different encoding on other env (say browser), Thus resulting in different signature and failure in verification

Answer 7 · 2022-08-21T23:10:29.000Z

The problem with adding a tag for Uint8Array on the browser is that this is kind of the main class for binary data, so I think most users would expect that they could use this without inducing tags. And we can't be sure that the encoded data will be send to be a CBOR decoder that understands these tags, so it is best to avoid tags unless needed to explicitly specify a non-default typed array. Changing this would also be backwards incompatible for any existing users that rely on the current behavior as well.

I think the best approach is to a tagUint8Array flag that can be used to indicate if you want Uint8Arrays tagged.

Answer 8 · 2022-08-23T03:19:46.000Z

Sure @kriszyp , may be we can go with a flag option as of now.
However i am still wondering if there can be a case where during encoding the data we pass something [Buffer,Uint8array] on node. This would append a tag for Uint8array but decoding the result using cbor-x will give different output right?

Also i tried one more case
const test = {
name: 'shubham'
}
console.log(cborweb.encode(claims).toString('hex'))
console.log(encoder.encode(claims).toString('hex'))

Here both produces different output, but when passed as map object output is same which i think is bcoz map object is handled separately . Is this expected behaviour when we just pass an json object? My assumptions were the output from cbor and cbor-x should be same on same env. (Tested on node)

Answer 9 · 2022-08-23T19:37:15.000Z

However i am still wondering if there can be a case where during encoding the data we pass something [Buffer,Uint8array] on node. This would append a tag for Uint8array but decoding the result using cbor-x will give different output right?

By default on node, if you serialize [Buffer,Uint8Array] the latter will be tagged, and will decoded but to a Buffer and Uint8Array (with the new option you can explicitly turn this with tagUint8Array: false).

Is this expected behaviour when we just pass an json object?

Yes, there are multiple valid encodings of a given data structure, and by default cbor-x uses the "fast" one. See the variableMapSize flag under https://github.com/kriszyp/cbor-x#options if you would prefer to always use the most compact map encoding.

Answer 10 · 2022-08-25T23:30:38.000Z

FYI, these updates/fixes/ are available on cbor-x@1.4.0.

Answer 11 · 2022-09-22T19:16:50.000Z

Hi @kriszyp if i try to compare encoded data size of { 1: [0, 124] } and { 1: 0, 2: 124 }, it gives same size. Tried multiple instances of the same as JSON objects in array. Is this expected? I was of the opinion that the former would take less space as it does not require one more key field. Any idea what could be the reason or this is expected as "break" stop code is added after the last item internally to determine the end of array?

Answer 12 · 2022-09-23T14:56:36.000Z

@shubhamvrkr I think this would be expected. An array does indeed have an extra byte (the array byte actually specifies the length, stop bytes are only used for iterators). So you are trading a byte for a property for a byte for an array. Now if you have multiple items in array, that is definitely smaller than an object/map with multiple properties.

Answer 13 · 2022-09-23T17:10:52.000Z

@shubhamvrkr I think this would be expected. An array does indeed have an extra byte (the array byte actually specifies the length, stop bytes are only used for iterators). So you are trading a byte for a property for a byte for an array. Now if you have multiple items in array, that is definitely smaller than an object/map with multiple properties.

Thanks @kriszyp !

Answer 14 · 2022-10-08T13:37:57.000Z

Hi @kriszyp
I have one question.
I am using https://github.com/NordicSemiconductor/zcbor cddl library to generate c code from cddl definition. The code generated is only for encoding purpose.
However while decoding using cbor-x decode func i am getting error. 'Error: Data read, but end of buffer not reached' but if i use decodeMultiple func i get the data as array.

Example:
encoded Hex using C code: 011a0003283e9f00001989940119ebc8021a0003148e03ff
decoded while decodeMultiple func : [
1,
206910,
[
0, 0, 35220,
1, 60360, 2,
201870, 3
]
]
Again encoded via cbor-x encode function hex : 83011a0003283e8800001989940119ebc8021a0003148e03.
**Decoded via cbor2json utility gives **:
1
206910
[0,0,35220,1,60360,2,201870,3]

So i think the issue is the how the data is encoded. i.e c code defines struct and encode each field defined in the struct separately.
I have added the types.h and encode.c file which the zcbor has created for reference.

Is there a way to provide something like this and convert data back to javascript object ? Or may be any cddl tool that can generate js equivalent code to process like c.
I could still do it via output of decodeMultiple but is there a better way to decode this. e.g Imagine in future the cddl definition changes so i just define a class definition for decoder and it decodes and returns in equivalent JS object.

sidecar.txt - cddl defination
sidecar_encode.txt - encoder in c
sidecar_types.txt - header file for sidecar struct
short.txt - sidecar structure in JSON.

Answer 15 · 2022-10-09T01:00:17.000Z

if I understand correctly, it sounds like you are probably looking for a JS CDDL decoder, that can read a CDDL and automatically assign the decoded CBOR to properties based on position. Without it, you need to do something like:
sidecar = { id: decoded[0], filesize: decoded[1], segments: decoded[2].map(...)}

Answer 16 · 2022-10-09T05:04:37.000Z

Correct @kriszyp !. It will be more helpful when cddl spec gets updated and based on version i can use appropriate decoding cddl code. Any idea if there is any such tool that generates JS code from cddl.

Answer 17 · 2022-10-10T16:04:36.000Z

Hi @kriszyp , i have one question:
I am encoding this data via c code generated from cddl spec. The definition is based upon CBOR maps.
{
'1': 1,
'2': [ { '1': 18446744073509551615, '3': 0 } ],
'3': 18446744073509759395
}

The output hex produced by c code is
bf 01 01 03 1bfffffffff41769a3 02 9f bf 01 1bfffffffff4143dff 03 00 ff ff ff

Encoding 18446744073509759395 (uint64) => 1bfffffffff416972b
Rncoding 18446744073509551615 (uint64) => 1bfffffffff4143dff.
bf tag is for { } and 9f is for array iterator.

Decoding above hex via cbor-x works properly. However encoding produces different result

a3 01 01 02 81a2 01 1bfffffffff4143dff 03 00 03 1bfffffffff41769a3

Where does a3 and 81a2 comes into picture and how this info is enough to decode it back ?

Sidecar:

byterange-segment = {
startRange => uint .size 8,
pos => int .size 2 .ge -1
}

discrete-segment = {
segmentRegex => text,
pos => int .size 2 .ge -1
}

byterange-segment-array = (
filesize => uint .size 8,
segments => [+ byterange-segment]
)

discrete-segment-array = (
segments => [+ discrete-segment]
)

segments-array = (discrete-segment-array // byterange-segment-array)

sidecar = {
version => uint,
segments-array
}

version = 1
filesize = 3
segments = 2
startRange = 1
segmentRegex = 2
pos = 3

Answer 18 · 2022-10-14T08:15:46.000Z

This made it clear : https://www.rfc-editor.org/rfc/rfc7049#appendix-A.
a3: Map with finite keys (i.e 3)
81: Array of length 1
a2: Mp with finite keys (i.e 2)