pyfisch/cbor

Looking for Maintainers

pyfisch opened this issue ยท 37 comments

Hi everyone,

when I started the serde_cbor crate in September 2015 it was a quick experiment on how to implement a new data format in serde. In the years since then the crate was much improved and it now appears to be used in quite a few projects.

I'd like to thank all contributors including @chrysn (no-std and IO reader), @sfackler (a near complete rewrite in 2017 and several other changes), @baloo, @wildarch (no-std support), @rklaehn (tags) and many more. In this time I have been the person who replied to many issues, did small fixes and reviewed most pull requests.

The crate has a bunch of build flags (cargo features) including std, alloc (like std but without io) and tags. CBOR documents can both be (de-)serialized using Read/Write and slices/Vec<u8>. At run-time the user can choose between legacy-enums or standard serde enum and packed encoding in addition to the options offered by serde. Many issues discovered by users were due to a bad interaction between multiple features. Changes to the crate sometimes resulted in problems in a seemingly unrelated location. Within the past year it became increasingly clear to me that I don't understand all the details and interactions of different features. This is in part because I accepted pull requests implementing new features but did not review the code thoroughly enough. Some people expect me to add new features they need for their own project. If I reject these features because they complicate the crate and I don't want to maintain them they are unhappy.

I currently don't use CBOR in any project and therefore I maintained the crate mainly as a service to the Rust community. As serialization is not very exciting I kind of lost interest and often people need to wait a long time before I reply to their issues or pull requests.

For these reasons I decided not to continue the development of this crate and only occasionally review pull requests with bug fixes.

Unless of course a Rustacean who wants to continue the development and maintain the crate. There appears to be demand for a good CBOR implementation in Rust but currently this crate does not fit all use cases. To make the crate more useful and bug free I believe an almost complete rewrite is necessary.

The new project should consist of two parts.
First a low-level crate that parses and writes CBOR but does not depend on serde. It should support reading all valid CBOR documents including those that contain tags and unknown simple values. Users should be able to choose all serialization details including the bit-width of integers and floats. This enables the implementation of custom serialization, validation and "canonical" encodings.
The second part is a crate implements CBOR for serde and builds on the first crate. It is similar to this crate now but it should be configuration free and just serialize every data structure provided by serde. If people need packed encodings, alternative serializations, tags they can either use attributes provided by serde, add their own serialization attributes or depend only on the low-level crate. This is because the current configuration options often cause problems as noted above.
The proposed rewrite wouldn't be backwards compatible but right now the crate is a dead-end and I don't think it can progress to 1.0 from there neither can the existing bugs all be fixed. I'll be happy to give advice on the rewrite and help with problems.

If you want to maintain the crate (or even attempt the rewrite) leave a comment.

Thanks for reading all this!

I'm interested, but I'm not a Rust expert.

Interested. I'm using serde_cbor to serialize structures into a KV store for a new project of mine.
First of all, thanks for creating this project, supporting it until now and being open about your decision to move on. So many crates become neglected without anyone noticing.
I'll take a look at the source and consider what it might take to create the new version.
Thanks again. ๐Ÿ‘

Iโ€™m interested as well. I always wanted to do something like this and it would be a nice challenge for me.

Hi @pyfisch. Thanks for the repo. If needed I can help you set up a team of people who carry forward maintaining this crate.

vmx commented

Hi @pyfisch, thanks for all your work and the time you invested in getting tag support in and discussing related things with me.

The new project should consist of two parts.
[โ€ฆ]

I fully agree with your assessment. I've nothing to add.

I'd be happy to take some of the workload on the first-part (CBOR serialization / deserialization), as I don't have sufficient understanding for the second (serde) -- but of course happy to collaborate with whoever would work on that.

Who'd be in for the serde part?

So how shall we go about this? Given that this crate has, so far, served so many of us well, it certainly makes a good starting point, I'd probably start off a branch that removes the serde parts and exposes the more raw (de)serialization of static CBOR primitives, using its the removed parts as guidance as to what the serde part will expect. Whom may I ping to review this at its early stages?

I can do the serde part. I suggest to expose a struct or enum which represents any data that can be serialized and deserialized like serde_json does it with their Value enum.

I suggest to expose a struct or enum which represents any data that can be serialized and deserialized like serde_json does it with their Value enum.

That's provided by serde_cbor::Value, I think.

I am not interested in being a full maintainer, but am willing to review code as necessary. Feel free to ping me as desired!

I think it may be good for cbor to be under https://github.com/serde-rs/cbor? Not sure what @dtolnay thinks.

If there is a rewrite, I am also interested to help out with the serde part.

Hi everyone,

there are surely more people willing to invest time in this crate than I expected. ๐Ÿ˜ƒ

@hansl messaged me in private and offered to write both a non-serde CBOR crate, the serde integeration and maintain the crates in the future. He has written a README for the low-level API: hansl@cbb5fb5 I've already added a few remarks, maybe you want to have a look and add your comments.

hansl commented

Hey guys. I did some more work this weekend and I think I'm getting close to a low-level serialization API for CBOR. You can see the latest here.

I am now working on the latest extensions (simple values, floats, bigint, dates, ...). I'll update the README at some point in the near future. Non of that is owned (I will add a ValueOwned at some point, and I will add specific tests in a separate no_std crate for no_std, although I made sure my code compiles with it).

If you have comments, please feel free to either:

  1. reply here,
  2. send me a tweet (@hanslatwork), or
  3. comment on the latest commit on that branch.

Don't send me issues or PR on my repo yet.

Next steps:

  • finish serialization of Value
  • serializer that serializes values without necessarily validating inputs
  • implement ValueOwned
  • start deserialization
  • start schema
  • try to get the keys to republish on top of the cbor crate if possible, as a new major version (breaking change)
  • rebuild serde_cbor on top of the low level API
  • (as part of the rebuild but just so I don't forget) implement ALL the tests from serde_cbor to avoid regressions
  • publish the new serde_cbor with the new code re-written
  • prop tests for all serialization + deserialization

Let me know if I'm missing something!

@hansl Thanks for taking up that work !
Since the crate intends to be very low-level, I would expect a zero-allocation API. Is that possible with what you have in mind ? Currently it looks to me like it's building big values on the stack, but I'd rather the API could write the bytes directly to a buffer or a file without needing to construct intermediate objects.

hansl commented

Hi @Nadrieril,

it looks to me like it's building big values on the stack

Correct, it can take up to 12 bytes on the stack if my calculations are correct. But for this to be the case you need to encode those bytes in a value. Is &u64 (instead of 8 u8 directly) what you had in mind?

If all you want is encode values from raw bytes and send them to a writer (without holding a Value object), I would wait for the serializer structure which will be closer to what you're looking for.

hansl commented

@Nadrieril

I added a Serializer structure to the cbor (see test here). It takes a value and encodes it and send it to the writer directly. Now the current implementation is naive; it actually creates a Value<'_> on the stack and write it. But I think we can improve on this later without changing the API.

The big advantage is also that you can encode "raw" maps or arrays, where you just send the length and then encode values one by one. This is also true for indefinite bytestrings, text, arrays and maps. So you don't need to have the full array/map in memory anymore before you encode it. The maximum stack space you'll need with this structure is the size of Value itself, which as I said we can rework.

Let me know if you have comments.

@hansl Would it be better to have cbor::vec! macro rather than using cbor::vector with vec!?

hansl commented

@pickfire That API has changed but the README was not updated. Thanks for following up though, I'll update the README when I get a chance.

Since there are more people willing to help, where does the development continue? Do we ditch all old code, and start over, or is there a gradual migration path?

In order to push this forward, it might make sense to start a rust-cbor organization, and add the people here

https://github.com/rust-cbor

I also opened up a matrix chat room at #rust-cbor:matrix.org

hansl commented

See my comments above. I have a todo list to finish the cbor crate itself, before switching to serde-cbor, and I have a branch running in my repo.

Considering I have time to put in about 5 hours a week on this right now it's hard for me to make significant progress quickly, but I'm still going at a good pace. However, I won't move this to a new rust-cbor org for the purpose of a single repo. I'd rather have it hosted in the serde org anyway. It's the most natural place to live, IMO.

I'm waiting to have something to show (which I'm about halfway through actually) before asking them to create a repo there. From that point we can gather CI and CD requirements and set that up.

I submitted a PR (#184) with the goal of making incremental progress, since that way others can be helpful (and the tests can continue to pass). I have some other commits locally with a similar goal, but it's easier to discuss them one at a time.

hansl commented

Update: Deserialization was added this weekend for numbers, along with a number of small improvements to the way MajorType/MinorType interacts, and how traits interact (Write and WriteTo). This is a very minimal deserializer for now (and will be kept this way); there are ways to break it and ways to deserialize invalid CBOR for example. That's by design; most functions will have a check, a peek and a read variant of the function to either read all or error, read-ahead or None, and modify a mutable slice along.

I also added proptests for numbers to serialize + deserialize and validate they're the same numbers. Since we also test that serialization to a hex codes and deserialization from hex codes work, we should have pretty much everything covered for tests (aside from adding tests for new features).

Next step is finishing deserialization of other values. Once that's done I would like to discuss taking over the cbor crate for this new low-level one.

Please refer to the todo list in my previous comment for the status of what remains to be done.

Regardless of what decision is made, I would strongly request the stability in Rust version to be taken into consideration. Supporting only very new Rust versions is clearly unproductive for the ecosystem and serde specifically has always had a strong position in supporting older Rust versions.

IIRC the current crate supports v1.32 or v1.33, it would be very much appreciated if that support can be maintained in the rewrite.

Thanks! I'll certainly keep following the conversation here.

I submitted 4 pull requests today. This is almost exclusively on the "top" end of the current crate. This implies that the low-level work should be unimpacted by these changes. Would anyone be willing to review them?

Would anyone be willing to review them?

I am not willing to merge them. Sorry. To foundations of this crate have some long-standing issues and I don't want to add any additional stuff on top. There have been many features added by contributors but sometimes these features had subtle bugs or interacted badly with each other. In this case I am the one struggling to understand the issue and keeping the feature working.

When the rewrite is done and the crate has a new maintainer I am sure you they will review your pull requests. In the meantime I suggest you maintain a fork of this crate with all the additional features you need.

vi commented

What is current maintainership status of this library?

Has "Looking for Maintainers" resulted in a new maintainer?

hansl commented

That would be me. Sorry, I kind of came forward in these comments but I haven't finished rewriting the low level one.

The current work is at https://github.com/hansl/cbor/tree/rewrite.

Hi @hansl,

I saw you continued the rewrite a few days ago. ๐Ÿ˜„ If you want feedback let me know.

I did get a reply to my email and I interpreted your comment as meaning you wouldn't continue the rewrite. Therefore I created a project with the intend to distribute the items to volunteers interested in a rewrite. Sorry for that.

hansl commented

Hi. I re-added the current serde_cbor code to my repo. I'm going to talk to the serde org to move it there, and I'm in the process of adding github actions and removing travis.

I think I'm in a good place to start accepting PRs and issues, but I'd like to wait until the code is in the serde org (or isn't, if they don't want it there).

vi commented

@hansl, What is status of the rewrite? Are there any plans on releasing it to crates.io?

This is just a note to everyone that there is now a new serde-enabled CBOR crate named ciborium (documentation here).

This was built by the Enarx project which is actively using CBOR and needed a robust, flexible implementation. I believe that I've hit many of the redesign points that @pyfisch wanted for this crate. So please check us out and let us know if you spot any problems. It should be mostly a drop-in replacement for serde_cbor and will be actively maintained.

@npmccallum except for the no_std no alloc use case it seems

@npmccallum except for the no_std no alloc use case it seems

The readme discusses a potential low level library which could target that case. I would merge patches for that. But it isn't my primary use case.

It doesn't have Serializer and Deserializer which we can easily use for serde_transcode.

It doesn't have Serializer and Deserializer which we can easily use for serde_transcode.

It does, they're just private because I wasn't sure if anyone actually needed them. We can polish them up and make them public.