Implement native lib0 encoding/decoding
nugmanoff opened this issue ยท 5 comments
This can be useful in various ways:
- To further improve process of passing complex types through Rust bridge, see related decision log
- It can be used to implement fully-fledged native y-protocol implementation.
For reference:
- lib0 implementation in Rust
- Original lib0 implementation in JavaScript
Note: there is no need to implement all of the functionality from lib0, only encoding/decoding parts are needed.
Adding a couple of links here that Bartosz provided in discord (as much for my reference as anyone's). The focus of lib0
encoding and decoding is providing a serialization format for the underlying CRDT data structure, primarily so that in encoded form it doesn't mandate such a huge memory addition to the data being represented. Lots of extraordinarily good detail about this problem space are in the video "CRDT: The Hard Parts" by Martin Kleppmann.
Bartosz also has details about v1 vs v2 serialization and what it means in the underlying Yrs code at https://bartoszsypytkowski.com/yrs-architecture/#serialization
I see, that makes sense. But still, if I am getting it right, we can leverage lib0
for general purpose binary encoding and decoding as well. Because as Kevin noted in his original js implementation it is just a schema-less binary encoding format, which is compatible (or at least similar to) Google's Protobufs.
I wanted to also point out the potential for using BinaryCodable (MIT licensed) for serializing arbitrary Swift types into a compact representation to pass through to the Rust/Yrs side of things. I haven't quite figured out how we might use lib0
for that capacity, so it may be a better solution. That said, I think BinaryCodable is something we should seriously consider, primarily for the following reasons:
- As I understand it,
lib0
is primarily rust oriented code, so we'd be encoding any relevant types from Swift into some neutral format to be able to pass it through to Rust, where we could then use thelib0
encoding. Correct me if I'm missing the flow here, but it seems like we'd end up doing two rounds of encoding and decoding to get all the way in to the core libraries, and then have the same dual-decode path for any returning values. - BinaryCodable builds on top of the
Codable
protocol that is - for better or worse - the Swift idiomatic standard. It also does variable length encoding, and has some (limited) compatibility with the same mechanisms used in Google's protobuf implementation. If this works and is sufficient, then we'd only need a single round of encoding and decoding - and on the Swift side of the bridge.
The obvious downside that I see is that if you were intentionally wanting to work cross-language, then having encoding in something akin to JSON or Protobuf's directly would be far more amenable to deserializing into relevant, mapped types within other languages. I don't know how easy, or even if it's possible, to decode the results of BinaryCodable into a type within another language.
I've also taken a look at BinaryCodable โ and I was keeping in mind that we might use it as an inspiration/foundation for idiomatically implementing lib0 in Swift rather than using it directly.
Correct me if I'm missing the flow here, but it seems like we'd end up doing two rounds of encoding and decoding to get all the way in to the core libraries, and then have the same dual-decode path for any returning values.
I think I have undercommunicated my reasoning behind the need for native lib0 in Swift. I will try to explain myself better:
YArray
,YMap
,YText
attributes and etc. are dynamically typed and can literally store any type. That's why we need a way to passAny
type through the Rust-Swift bridge. It is currently not supported by the UniFFI.- That's why we need to be able to somehow encode any type on the Swift side and pass it to the Rust side. Currently it is done by leveraging JSON encoding/decoding.
- In
Yrs
, when inserting a value into theYArray
we need the type of this value to conform toPrelim
trait. lib0
hasAny
type. Conveniently, anyAny
-convertible (implementsInto
trait) type is automatically conforming toPrelim
(which makes it match the above requirement).- Right now both
ypy
andyrb
convert theirAny
values (that come bundled with their bindgen frameworks) into thelib0::Any
value and then they perform operations with it. E.g. insert into theYArray
- As I mentioned above, because our bindgen framework doesn't come with its own
Any
value โ we need to come up with a way to do it ourselves.
Right now following steps are made to insert an element into the YArray
:
โ insert
function is called with value of any Swift type
โ value is encoded as JSON string
โ insert
function from Rust is called with JSON string
โ Rust decodes JSON string into the lib0::Any
(thanks to its json functionality)
โ resultant lib0::Any
value is inserted
How I think it could be reduced by implementing native lib0
encoding on the Swift side:
โ insert
function is called with value of any Swift type
โ value is encoded in lib0
binary format as buffer (1)
โ insert
function from Rust is called with binary buffer
โ lib0::Any
is instantiated from that buffer (2)
โ resultant lib0::Any
value is inserted
(1) โ I believe this should be faster than JSON encoding (I could be wrong)
(2) โ we skip the expensive decoding part here, and we just instantiate the lib0::Any
type directly from buffer here, because we are sure that it was encoded by the lib0
rules โ and that's the place where we need to have native lib0
encoding
Great detail, thank you! As a general flow, that makes a lot of sense, and I think I see where you're going but I'm not sure about the proposed flow using lib0
- at least the first couple of steps.
โ
insert
function is called with value of any Swift type
โ value is encoded inlib0
binary format as buffer (1)
The insert function being called with an arbitrary swift type makes sense - including the box-of-everything Any
swift type. What I'm not sure of is how you can use lib0
to get from a swift Any type into a binary buffer. I'd presumed that was something that absolutely had to be done on the swift side.
Once it's a bucket of bytes, then that totally makes sense - which is why I was thinking that BinaryCodable would be nice, because it was capable of the back-and-forth conversion from a specific swift type (as long as it conforms to the Codable
protocol) into a relatively-compressed bucket of bytes.
Maybe the assumption that I'm making is that lib0
is entirely on the rust side of the bridge - if we hand it an explicit swift type, does it know how to "disassemble" and "reassemble" that into the relevant types, or does it require an explicit conversion into a Swift Any
type, where we hide all of the inherent type information? Likewise, if it converts back to an Any
type, can that be successfully cast (using something like as !
in Swift) back into the relevant type that was originally stored?
In my glancing through the lib0
pieces at https://github.com/dmonad/lib0, I was thinking that perhaps you were taking a path of:
some Swift type that conforms to Codable
โ (encode w/ JSONEncoder, or speedy equiv) โ JSON โ (encode JSON, passed as raw str
into Rust side with lib0
) โ bucket-of-bytes that the internal systems treats as opaque.
Is there an optimized path using lib0 that lets you:
some Swift type that conforms to Codable
โ (encode directly with lib0
) โ bucket-of-bytes that we can pass over to Rust and Yrs functions as some opaque binary blob
(Side note: I've been poking at a new (swift) Benchmarking library that might be fun to try and use to do some comparisons for speed of operations. It's package-benchmark, and looks reasonable easy to knock together quick benchmarks for "is this faster than that" kinds of questions)