kriszyp/cbor-x

Support for cbor-x on plain v8 javascript engine

shubhamvrkr opened this issue · 18 comments

Hi @kriszyp

I am trying to use this package on env other than node and browser which doesn't have global or any browser/node based functions.
Executing on V8 gives undefined error for the global object declared here https://github.com/kriszyp/cbor-x/blob/master/decode.js#L825

It looks like the glbl object can be made as a map that maps => native object function. Can we do something like this or am i missing something?

let glbl = {
'Uint8Array': Uint8Array,
'Uint8ClampedArray': Uint8ClampedArray,
'Uint16Array': Uint16Array,
'Uint32Array': Uint32Array,
'BigUint64Array': BigUint64Array,
'Int8Array': Int8Array,
'Int16Array': Int16Array,
'Int32Array': Int32Array,
'BigInt64Array': BigInt64Array,
'Float32Array': Float32Array,
'Float64Array': Float64Array,
'RegExp': RegExp
};

I suppose we could do something like that. There is a little more to it; BigInt isn't available on all supported platforms, and we also had been using glbl.RegExp, glbl.Error, and glbl.Set. I'll see what I can do though.

@kriszyp sure @kriszyp , Thanks!! Kindly update any progress, kinda urgent though!

Hi @kriszyp , i am facing an issue while encoding the content. Looks like this is a bug. With cbor-web lib it works but cant use cbor-web in our env. I am using 1.3.3 of cbox-x;

Heres a code snippet

const p = hexStringToUint8Array("a10105")
const payload = hexStringToUint8Array("a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71");

const MACstructure2 = [
'MAC0', // 'MAC0' or 'MAC1', // context
p.buffer, // protected
new ArrayBuffer(0), // bstr,
payload.buffer
];
const tobeMaced2 = cbor.encode(MACstructure2)
const tobeMaced2CborWeb = cborweb.encode(MACstructure2)

I get different output from the library

tobeMaced2: 84644d41433043006000405850856fbb7f00000800000001000000000000000000000078ca856fbb7f0000000000000000000088ca856fbb7f0000000000000000000098ca856fbb7f00008804a06dbb7f0000ffffffffffffffff0000
tobeMaced2CborWeb: 84644d41433043a10105405850a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71

Looks like cbor-x is not able to read ArrayBuffer content. However if i pass Uint8Array instead of Arraybuffer i get same output from both lib but the output has changed from original one.

const MACstructure2 = [
'MAC0', // 'MAC0' or 'MAC1', // context
p, // protected
new Uint8Array(0), // bstr,
payload
];

tobeMaced2: 84644d414330d84043a10105d84040d8405850a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71
tobeMaced2CborWeb: 84644d414330d84043a10105d84040d8405850a70175636f61703a2f2f61732e6578616d706c652e636f6d02656572696b77037818636f61703a2f2f6c696768742e6578616d706c652e636f6d041a5612aeb0051a5610d9f0061a5610d9f007420b71

This becomes an issue when the encoding was done in nodejs or any other PL and its be decoded by this library. So i guess cbor-x needs to handle reading from ArrayBuffer. Let me know your thoughts on this.

A more simpler example:

const data = Buffer.from("a10105", "hex");
var a = new Uint8Array(data.length);
for (var i = 0; i < data.length; i++) a[i] = data[i];
encode(data).toString('hex'). //43a10105 => correct
encode(a.buffer).toString('hex') //43600000 => wrong
encode(a).toString('hex') //d84043a10105 => correct for uint8array

@kriszyp Also if i toggle this const hasNodeBuffer = typeof Buffer === 'undefined', The encoding of Uint8Array does not append d840 tag. which gives inconsistent behaviour. i think when input is given as ArrayBuffer in JS we dont append d840 tag but when its Uint8Array we should add it. Thoughts ?

Yes, you are right, ArrayBuffers were not being properly serialized. I have committed a fix for this and removed reliance on the global object.

Also if i toggle this const hasNodeBuffer = typeof Buffer === 'undefined', The encoding of Uint8Array does not append d840 tag.

That is because the most common class for holding binary data in NodeJS is the Buffer (returned by most Node APIs that return binary data), whereas on the browser, Uint8Array is the most common class for binary data (Buffer isn't available). So each of these are used the preferred class for holding CBOR's binary data strings.

I could add an option for explicitly turn tagging of Uint8Arrays on or off. Would that help?

@kriszyp So in browser we could still use Uint8Array as buffer to hold the result of encoding. However incase no buffer ( i.e browser), if the input to the encoder is ArrayBuffer we do not append d840 (i.e Tag 64) but if the input to the encoder is Uint8Array then we add Tag 64 as we do currently in node. Will this be doable ?
This will make the behaviour consistent on browser as well as on node. i.e inputs to encoder as ArrayBuffer are treated as Buffers in node and input as Uint8Array are treated as it is. (i.e same as node that adds Tag64). Thoughts?
Reason:
Imaging a CWT token is generated on node server and needs to be validated on any other v8 env that done have buffer. If on node the encoding was on uint8array as input, this would give different encoding on other env (say browser), Thus resulting in different signature and failure in verification

The problem with adding a tag for Uint8Array on the browser is that this is kind of the main class for binary data, so I think most users would expect that they could use this without inducing tags. And we can't be sure that the encoded data will be send to be a CBOR decoder that understands these tags, so it is best to avoid tags unless needed to explicitly specify a non-default typed array. Changing this would also be backwards incompatible for any existing users that rely on the current behavior as well.

I think the best approach is to a tagUint8Array flag that can be used to indicate if you want Uint8Arrays tagged.

Sure @kriszyp , may be we can go with a flag option as of now.
However i am still wondering if there can be a case where during encoding the data we pass something [Buffer,Uint8array] on node. This would append a tag for Uint8array but decoding the result using cbor-x will give different output right?

Also i tried one more case
const test = {
name: 'shubham'
}
console.log(cborweb.encode(claims).toString('hex'))
console.log(encoder.encode(claims).toString('hex'))

Here both produces different output, but when passed as map object output is same which i think is bcoz map object is handled separately . Is this expected behaviour when we just pass an json object? My assumptions were the output from cbor and cbor-x should be same on same env. (Tested on node)

However i am still wondering if there can be a case where during encoding the data we pass something [Buffer,Uint8array] on node. This would append a tag for Uint8array but decoding the result using cbor-x will give different output right?

By default on node, if you serialize [Buffer,Uint8Array] the latter will be tagged, and will decoded but to a Buffer and Uint8Array (with the new option you can explicitly turn this with tagUint8Array: false).

Is this expected behaviour when we just pass an json object?

Yes, there are multiple valid encodings of a given data structure, and by default cbor-x uses the "fast" one. See the variableMapSize flag under https://github.com/kriszyp/cbor-x#options if you would prefer to always use the most compact map encoding.

FYI, these updates/fixes/ are available on cbor-x@1.4.0.

Hi @kriszyp if i try to compare encoded data size of { 1: [0, 124] } and { 1: 0, 2: 124 }, it gives same size. Tried multiple instances of the same as JSON objects in array. Is this expected? I was of the opinion that the former would take less space as it does not require one more key field. Any idea what could be the reason or this is expected as "break" stop code is added after the last item internally to determine the end of array?

@shubhamvrkr I think this would be expected. An array does indeed have an extra byte (the array byte actually specifies the length, stop bytes are only used for iterators). So you are trading a byte for a property for a byte for an array. Now if you have multiple items in array, that is definitely smaller than an object/map with multiple properties.

@shubhamvrkr I think this would be expected. An array does indeed have an extra byte (the array byte actually specifies the length, stop bytes are only used for iterators). So you are trading a byte for a property for a byte for an array. Now if you have multiple items in array, that is definitely smaller than an object/map with multiple properties.

Thanks @kriszyp !

Hi @kriszyp
I have one question.
I am using https://github.com/NordicSemiconductor/zcbor cddl library to generate c code from cddl definition. The code generated is only for encoding purpose.
However while decoding using cbor-x decode func i am getting error. 'Error: Data read, but end of buffer not reached' but if i use decodeMultiple func i get the data as array.

Example:
encoded Hex using C code: 011a0003283e9f00001989940119ebc8021a0003148e03ff
decoded while decodeMultiple func : [
1,
206910,
[
0, 0, 35220,
1, 60360, 2,
201870, 3
]
]
Again encoded via cbor-x encode function hex : 83011a0003283e8800001989940119ebc8021a0003148e03.
**Decoded via cbor2json utility gives **:
1
206910
[0,0,35220,1,60360,2,201870,3]

So i think the issue is the how the data is encoded. i.e c code defines struct and encode each field defined in the struct separately.
I have added the types.h and encode.c file which the zcbor has created for reference.

Is there a way to provide something like this and convert data back to javascript object ? Or may be any cddl tool that can generate js equivalent code to process like c.
I could still do it via output of decodeMultiple but is there a better way to decode this. e.g Imagine in future the cddl definition changes so i just define a class definition for decoder and it decodes and returns in equivalent JS object.

sidecar.txt - cddl defination
sidecar_encode.txt - encoder in c
sidecar_types.txt - header file for sidecar struct
short.txt - sidecar structure in JSON.

if I understand correctly, it sounds like you are probably looking for a JS CDDL decoder, that can read a CDDL and automatically assign the decoded CBOR to properties based on position. Without it, you need to do something like:
sidecar = { id: decoded[0], filesize: decoded[1], segments: decoded[2].map(...)}

Correct @kriszyp !. It will be more helpful when cddl spec gets updated and based on version i can use appropriate decoding cddl code. Any idea if there is any such tool that generates JS code from cddl.

Hi @kriszyp , i have one question:
I am encoding this data via c code generated from cddl spec. The definition is based upon CBOR maps.
{
'1': 1,
'2': [ { '1': 18446744073509551615, '3': 0 } ],
'3': 18446744073509759395
}

The output hex produced by c code is
bf 01 01 03 1bfffffffff41769a3 02 9f bf 01 1bfffffffff4143dff 03 00 ff ff ff

Encoding 18446744073509759395 (uint64) => 1bfffffffff416972b
Rncoding 18446744073509551615 (uint64) => 1bfffffffff4143dff.
bf tag is for { } and 9f is for array iterator.

Decoding above hex via cbor-x works properly. However encoding produces different result

a3 01 01 02 81a2 01 1bfffffffff4143dff 03 00 03 1bfffffffff41769a3

Where does a3 and 81a2 comes into picture and how this info is enough to decode it back ?

Sidecar:

byterange-segment = {
startRange => uint .size 8,
pos => int .size 2 .ge -1
}

discrete-segment = {
segmentRegex => text,
pos => int .size 2 .ge -1
}

byterange-segment-array = (
filesize => uint .size 8,
segments => [+ byterange-segment]
)

discrete-segment-array = (
segments => [+ discrete-segment]
)

segments-array = (discrete-segment-array // byterange-segment-array)

sidecar = {
version => uint,
segments-array
}

version = 1
filesize = 3
segments = 2
startRange = 1
segmentRegex = 2
pos = 3

This made it clear : https://www.rfc-editor.org/rfc/rfc7049#appendix-A.
a3: Map with finite keys (i.e 3)
81: Array of length 1
a2: Mp with finite keys (i.e 2)