Feature request: add pointers to start of the raw cbor and the length of the raw cbor in bytes to a cbor_item_t
KoenGo opened this issue · 4 comments
I've been using the library for a while now, and am currently implementing COSE (RFC 9052). One thing that would be a really nice addition for me is to include pointers (or an index) to the start and end of the raw CBOR data to the cbor_item_t. This would avoid me needing to wrap CBOR items in an additional bytestring if I want to pass the raw CBOR around in my program.
For example if I have this nested map in CBOR:
A1646B657931A1646B657932187B
Diagnostic notation:
{"key1": {"key2": 123} }
And if I want to extract the nested map A1646B657932187B
, ({"key2": 123}
), and pass it around as raw bytes, this cannot be done (AFAIK, please correct me if I'm wrong). What I have to do is wrap the nested map in a bytestring 48A1646B657932187B
and then I can pass it around my program as I wanted. I don't want to burden my client applications with wrapping each object in a bytestring.
I envision it could look something like this:
typedef struct cbor_item_t {
...
unsigned char *start_ptr; // point to first byte of the item in the the raw cbor encoding
size_t length; // length of the item in the raw cbor encoding in bytes
} cbor_item_t;
Or if adding to the cbor_item_t is a no go, perhaps a utility function that can calculate it?
Would it be possible to add this? Thanks in advance. Note: I could also try to take a crack at it if needed.
Hi @KoenGo , just to confirm I understand, the idea would be to do this when parsing, i.e. after running cbor_load, you would also get pointers to the relevant parts of the original buffer?
Yes, that's right
Got it, thanks. I see two complications:
- cbor_item_t can be constructed w/o parsing (nothing to point to) or by streaming parsing (cbor_item_t outlives the buffer)
- Specifically it's not clear how the lifetime and deallocation of the buffer should be handled. I see two options:
- libcbor could take ownership of the buffer and ref-count it together with other data
- user would be responsible for it, knowing that pointers become invalid if the buffer changes
Neither of these seem to fit in the current API, e.g. what should cbor_copy do, copy the buffer and update all pointers (very weird), or alias the pointers (breaks ref counting if the buffer is managed by libcbor or creates a safety trap for the client).
So my first take is that adding such a pointer to cbor_item_t is not an option. The concept sounds reasonable to me (it's useful information from parsing that we throw away), let's explore other options.
Could you say a bit more about the use case? Why is it useful to have the original bytes around (vs. just cbor_serialize when needed)?
One idea would be to pass the position data into streaming decoder callbacks (https://github.com/PJK/libcbor/blob/master/src/cbor/streaming.c) so that there would be a special decoder that records location (not clear where though, cbor_item_t would still be optimal as it solves for nested items) and forwards the rest to default callbacks to build cbor_item_t.
I will explain a bit more about my use case of having access to the raw cbor. Lets first start with an introduction to my project. I am writing an application which implements COSE (RFC9052) and some custom cryptographic protocol on an embedded device (with compute power in the order of an older generation raspberry pi). I approached my project with a focus on maintainability and modularity. For this reason I want to be able to pass raw cbor from one module to the other, independent of the parser that is used. You can imagine the basic usage of my program this way:
- Module 1 receives a cbor message
- The cbor message is parsed to a struct, bytestrings are copied over to the struct.
- Cbor_item_t is decref'd
- The struct is used to perform some operations
- Module 1 now wants to pass the results of the operation and some part of the original cbor message on to module 2 (or even another program over a network). Here I wish I could have copied the raw cbor of a specific cbor_item_t to a buffer, and pass it on.
- Module 2 receives a cbor message and the result of the operation for further processing
The parser which is used in the module 2 could be libcbor, could be a custom parser or could even be a parser in another language. If my program would be compiled to a single binary, I could pass around the cbor_item_t (however I still am not a big fan of this considering modularity and maintainability). This is not the case, so the cbor needs to be in the raw form as I cannot simply pass around the cbor_item_t. As you stated, getting the raw cbor back can be achieved by simply cbor_serializing the cbor_item_t. However, I will be dealing with complex nested cbor structures, which can get quite large (from several MBs to 10s of MBs). Due to the modular design of my project I am willing to sacrifice some efficiency, but I can imagine efficiency becoming a problem here. This being said, I have not done any profiling so I cannot say for sure.
Coming to my second point. The application I am writing relies on digital signatures. When verifying a signature it is critical that the message you are verifying matches exactly with the message that was used to create the signature, or the process will fail. It is one of the primary reason I am using cbor, as it is a binary format and digital signatures are performed on raw bytes. You can already see the problem here. When access to the original cbor is lost, the serializer MUST recreate the original bytes exactly. Therefore, serializing the cbor_item_t back to raw cbor for signature verification is asking for trouble, especially when mixing different parsers.
I agree that it's pretty nasty that the pointers will become invalid if the buffer is deallocated or goes out of scope. How I envisioned this is giving the responsibility to the user to ensure no pointers are used if the buffer is deallocated. The way I use the library is primarily for parsing cbor data to structs, and in my case the cbor_item_t never outlives the buffer. However I could see how this is not the case for some users. What could be a solution is the create a seperate cbor_load function which does store the pointers in the cbor_item_t. If the cbor_item_t is initialized to 0, the pointer is NULL and the length 0, which the original cbor_load would leave untouched. Then it would be easy to check for a valid pointer (granted that the buffer is still valid).