Implement native lib0 encoding/decoding

Question

Implement native lib0 encoding/decoding

nugmanoff opened this issue 2 years ago · 5 comments

nugmanoff commented 2 years ago

This can be useful in various ways:

To further improve process of passing complex types through Rust bridge, see related decision log
It can be used to implement fully-fledged native y-protocol implementation.

For reference:

lib0 implementation in Rust
Original lib0 implementation in JavaScript

Note: there is no need to implement all of the functionality from lib0, only encoding/decoding parts are needed.

Answer 1 · 2023-01-26T19:40:25.000Z

Adding a couple of links here that Bartosz provided in discord (as much for my reference as anyone's). The focus of lib0 encoding and decoding is providing a serialization format for the underlying CRDT data structure, primarily so that in encoded form it doesn't mandate such a huge memory addition to the data being represented. Lots of extraordinarily good detail about this problem space are in the video "CRDT: The Hard Parts" by Martin Kleppmann.

Bartosz also has details about v1 vs v2 serialization and what it means in the underlying Yrs code at https://bartoszsypytkowski.com/yrs-architecture/#serialization

Answer 2 · 2023-01-27T09:32:06.000Z

I see, that makes sense. But still, if I am getting it right, we can leverage lib0 for general purpose binary encoding and decoding as well. Because as Kevin noted in his original js implementation it is just a schema-less binary encoding format, which is compatible (or at least similar to) Google's Protobufs.

Answer 3 · 2023-02-20T19:02:05.000Z

I wanted to also point out the potential for using BinaryCodable (MIT licensed) for serializing arbitrary Swift types into a compact representation to pass through to the Rust/Yrs side of things. I haven't quite figured out how we might use lib0 for that capacity, so it may be a better solution. That said, I think BinaryCodable is something we should seriously consider, primarily for the following reasons:

As I understand it, lib0 is primarily rust oriented code, so we'd be encoding any relevant types from Swift into some neutral format to be able to pass it through to Rust, where we could then use the lib0 encoding. Correct me if I'm missing the flow here, but it seems like we'd end up doing two rounds of encoding and decoding to get all the way in to the core libraries, and then have the same dual-decode path for any returning values.
BinaryCodable builds on top of the Codable protocol that is - for better or worse - the Swift idiomatic standard. It also does variable length encoding, and has some (limited) compatibility with the same mechanisms used in Google's protobuf implementation. If this works and is sufficient, then we'd only need a single round of encoding and decoding - and on the Swift side of the bridge.

The obvious downside that I see is that if you were intentionally wanting to work cross-language, then having encoding in something akin to JSON or Protobuf's directly would be far more amenable to deserializing into relevant, mapped types within other languages. I don't know how easy, or even if it's possible, to decode the results of BinaryCodable into a type within another language.

Answer 4 · 2023-02-22T17:25:49.000Z

I've also taken a look at BinaryCodable – and I was keeping in mind that we might use it as an inspiration/foundation for idiomatically implementing lib0 in Swift rather than using it directly.

Correct me if I'm missing the flow here, but it seems like we'd end up doing two rounds of encoding and decoding to get all the way in to the core libraries, and then have the same dual-decode path for any returning values.

I think I have undercommunicated my reasoning behind the need for native lib0 in Swift. I will try to explain myself better:

YArray, YMap, YText attributes and etc. are dynamically typed and can literally store any type. That's why we need a way to pass Any type through the Rust-Swift bridge. It is currently not supported by the UniFFI.
That's why we need to be able to somehow encode any type on the Swift side and pass it to the Rust side. Currently it is done by leveraging JSON encoding/decoding.
In Yrs, when inserting a value into the YArray we need the type of this value to conform to Prelim trait.
lib0 has Any type. Conveniently, any Any-convertible (implements Into trait) type is automatically conforming to Prelim (which makes it match the above requirement).
Right now both ypy and yrb convert their Any values (that come bundled with their bindgen frameworks) into the lib0::Any value and then they perform operations with it. E.g. insert into the YArray
As I mentioned above, because our bindgen framework doesn't come with its own Any value – we need to come up with a way to do it ourselves.

Right now following steps are made to insert an element into the YArray:

→ insert function is called with value of any Swift type
→ value is encoded as JSON string
→ insert function from Rust is called with JSON string
→ Rust decodes JSON string into the lib0::Any (thanks to its json functionality)
→ resultant lib0::Any value is inserted

How I think it could be reduced by implementing native lib0 encoding on the Swift side:

→ insert function is called with value of any Swift type
→ value is encoded in lib0 binary format as buffer (1)
→ insert function from Rust is called with binary buffer
→ lib0::Any is instantiated from that buffer (2)
→ resultant lib0::Any value is inserted

(1) – I believe this should be faster than JSON encoding (I could be wrong)
(2) – we skip the expensive decoding part here, and we just instantiate the lib0::Any type directly from buffer here, because we are sure that it was encoded by the lib0 rules – and that's the place where we need to have native lib0 encoding

Answer 5 · 2023-02-22T21:11:23.000Z

Great detail, thank you! As a general flow, that makes a lot of sense, and I think I see where you're going but I'm not sure about the proposed flow using lib0 - at least the first couple of steps.

→ insert function is called with value of any Swift type
→ value is encoded in lib0 binary format as buffer (1)

The insert function being called with an arbitrary swift type makes sense - including the box-of-everything Any swift type. What I'm not sure of is how you can use lib0 to get from a swift Any type into a binary buffer. I'd presumed that was something that absolutely had to be done on the swift side.

Once it's a bucket of bytes, then that totally makes sense - which is why I was thinking that BinaryCodable would be nice, because it was capable of the back-and-forth conversion from a specific swift type (as long as it conforms to the Codable protocol) into a relatively-compressed bucket of bytes.

Maybe the assumption that I'm making is that lib0 is entirely on the rust side of the bridge - if we hand it an explicit swift type, does it know how to "disassemble" and "reassemble" that into the relevant types, or does it require an explicit conversion into a Swift Any type, where we hide all of the inherent type information? Likewise, if it converts back to an Any type, can that be successfully cast (using something like as ! in Swift) back into the relevant type that was originally stored?

In my glancing through the lib0 pieces at https://github.com/dmonad/lib0, I was thinking that perhaps you were taking a path of:

some Swift type that conforms to Codable → (encode w/ JSONEncoder, or speedy equiv) → JSON → (encode JSON, passed as raw str into Rust side with lib0) → bucket-of-bytes that the internal systems treats as opaque.

Is there an optimized path using lib0 that lets you:

some Swift type that conforms to Codable → (encode directly with lib0) → bucket-of-bytes that we can pass over to Rust and Yrs functions as some opaque binary blob

(Side note: I've been poking at a new (swift) Benchmarking library that might be fun to try and use to do some comparisons for speed of operations. It's package-benchmark, and looks reasonable easy to knock together quick benchmarks for "is this faster than that" kinds of questions)