Serde Derive Support
casey opened this issue ยท 9 comments
I've been using serde-bencode, but I ran into some issues, so I'm reconsidering my choice of library.
Bendy looks great, but I'm making heavy use of serde derived serialization and deserialization, which I think Bendy doesn't do.
Is this a feature which bendy might eventually support?
Thanks for your interest!
While serde is not a high priority for our own usage, it should be fairly straightforward to implement the necessary traits, and we'd happily work with you to add serde support. I predict that it would be well under a week of work, probably only a day or two.
The main reason that it doesn't exist yet is that bencode requires that dictionaries be stored sorted by key, which serde doesn't guarantee. This means that any time a dictionary is output, it would need to be buffered, and our use case for bendy needs to work on extremely memory-constrained devices which precludes our own internal use of serde. However, because we do occasionally need the functionality, there's already support for emitting unsorted dictionaries with the necessary buffering.
Thank you very much for the information!
That all sounds good, I think it's likely that I'll take a crack at it. When I inevitably run into problems with the implementation, I'll let you know :)
I just started implementing Serde serialization, and there's a bit of an impedance mismatch between Serde's trait-based serialization and the bendy's callback-based encoding for compound objects, for example sequences/lists.
When serializing a sequence, serde first calls Serializer::serialize_seq
, which should return an object that implements serde::ser::SerializeSeq
. SerializeSeq::serialize_element(&mut self, value: &impl Serialize)
is then called on this object with each value to serialize, and then finally SerializeSeq::end(self)
.
This is somewhat at odds with bendy's Encoder::emit_list. Since emit_list
requires a callback which, when called, must pass all values to be serialized in sequence to an encoder, all values must be ready before emit_list
is called. However, Serializer::serialize_seq
does not provide the values up front.
I don't have any particularly elegant ideas for how to resolve this. A decidedly non-elegant possibility is to add something like bendy::encoding::Encoder::begin_list() -> ListEncoder
, which returns a new type ListEncoder
, upon which emit(value)
can be called to emit values, in addition to end()
, which consumes the ListEncoder
and emits the closing e
. Between Encoder::begin_list
and ListEncoder::end
, doing anything with the original Encoder
would be verboten and a runtime error. However, this doesn't have the nice static guarantees of the original callback based-system, and somewhat complicates the interface.
What are your thoughts?
Yeah, those two interfaces do seem to be incompatible :-(
My first reaction is to have a parallel set of encoding structs for Serde that use Encoder::emit_token
internally. This will still give you Bendy's structure checking, but it will give you free rein to export an interface compatible with Serde without marring the safe bendy-native interface.
That said, it looks like you're working in a separate crate, so you don't have access to the emit_token
function. The best option that I can see is to instead add your serde support to bendy itself, enabled by a feature flag.
If you are committed to working in a separate crate, then the next best option is to add a type called TokenEncoder
to Bendy that exposes the emit_token
function while enforcing that it is used to emit a valid structure. This would leave you responsible for sorting dictionaries, but you can use the code in UnsortedDictEncoder
as an example for that; the code is fairly straightforward. If this is something you'd like, I should be able to have it for you by Monday evening.
If at all possible, though, I don't want to expose a ListEncoder like you described from the statically safe interface that already exists. Aside from adding an attractive nuisance (begin_list
seems like the obvious method to write a list, but it would be the one that's easier to misuse), it would also add a new way to misuse the API that is totally avoidable. (Right now, there are only three ways to get an error out of bendy's encoder: return it yourself, nest the structure too deep, or do nothing with a SingleItemEncoder)
Thanks for the feedback!
Using emit_token
sounds like a good option. I don't have any particular reason to work in a separate crate, so I'll switch to adding serde support directly to bendy under a feature flag.
I ran into an interesting issue replacing serde_bencode
in a personal project with my in-progress fork of bendy
with serde suport.
serde_bencodes
serializes None
as the empty string, and Some(x)
as the bencode for x
. This is leveraged in struct serialization and deserialization to skip optional fields with None
values. So Struct {field: None}
serializes as de
, and Struct{field: Some(1)}
serializes as d5:fieldi1ee
.
This is a bit odd, but it does mean that a user can derive Serialize
and Deserialize
on a struct with optional fields, and the derived implementations will be compatible with bittorrent metainfo, a common use for bencode, which skips optional fields:
#[derive(Serialize, Deserialize)]
struct Metainfo {
announce: String, // non optional announce URL
comment: Option<String>, // optional comment field
// more fields...
}
I happen to be using serde_bencode
to serialize and deserialize bittorrent metadata, so this made serde_bencode
very convenient.
However, it's a bit weird. Some(())
and None
both serialize to the empty string. (Unit
is also represented with the empty string.). A Vec<Option<T>>
can round-trip to a value with fewer values, since the None
values will be skipped when serializing and so won't be present when deserializing. I think this also causes some of the other issues I've had with serde_bencode
.
All this made me reconsider how best to map bencode to and from Serde's data model. I can think of a few ways to move forward:
- Replicate
serde_bencode
's behavior - Only support native bencode types
- Support all serde data model types
Replicate serde_bencode
's behavior
Pros:
- Nice when working with formats that skip optional fields
Cons:
- Generally strange
- Weird foot guns, for example
Vec<Option<T>>
values in the scenario above Serialize
can produce invalid bencode, andDeserialize
can consume invalid bencode, namely empty strings
Only support native bencode types
This would make Serialize
and Deserialize
panic on types without native bencode encodings, e.g. Option
, ()
, f32
, et al.
Pros:
- No weird footguns
- No empty strings
- No non-obvious encodings
Cons:
- Can't use bencode as a general-purpose serialization format
- Requires serde annotations when working with formats that skip optional fields
Support all serde data model types
Use a reasonable encoding for all serde data model types, using the empty list as a nil
value. Some examples:
true
<->i1e
false
<->i0e
()
<->le
Some(0)
<->li0ee
None
<->le
f32
<->4:XXXX
(XXXX are the bytes produced byf32::to_le_bytes()
)
Pros:
- No weird footguns
- Can use bencode as general purpose serialization format for all serde data model types
Cons:
- The encodings of non bencode native types will be specific to
bendy
, since they aren't part of the bencode standard - Requires serde annotations when working with formats that skip optional fields
Summary
The first option is convenient, but I think producing invalid bencode for some types is a non starter, and I don't like the idea that certain values don't round-trip.
Between the last two options, I'm personally leaning towards the latter, coming up with lossless mappings for all serde data model types.
What do you think?
I just opened #31 with an initial implementation with support only for bencode-native types. I figured that it would be good to keep the initial PR manageable.
Stuff for future PR(s):
- add support for non-bencode-native types
- add support for
deserialize_any
, which would allow, among other things, deserializing untagged enums - add support to serialize to and deserialize from bendy
Object
types