udoprog/musli

Feature request: variant of encode_bytes that lets you avoid allocating your own buffer for conversion

cormac-ainc opened this issue · 2 comments

Continuing on from #24 but off-topic for that issue: Most of the floats in this float-heavy format will be done using custom Encode/Decode impls on newtyped Vec<f32> or Vec<Point3>. Doing it by hand with byteorder::NetworkEndian is about 10x faster at encoding and 14x faster at decoding than the Sequence-based implementation on Vec<f32> with variable lengths, and still about 2.3x/3x faster with fixed lengths. (This is on a little-endian box.) Those are pretty significant margins and formats can be designed to chunk all the floats together in big vecs, which helps capitalise on LLVM having a highly optimised and vectorised encoder and decoder. It also makes skipping over all that data extremely quick. A few megabytes is typical for us.

The only pain point is having to either allocate a Vec<u8> or use a thread_local! buffer for the encode impls. I couldn't get musli to give me an encoder for writing raw bytes if I knew the encoded length upfront but didn't have the final byte slice handy. That seems like an opportunity for a really useful new API.

I think it could look a bit like this:

  1. A new type param on the encoder trait, type AsBytes: AsBytesEncoder
  2. A method on the Encoder trait, fn encode_as_bytes(self, length: usize) -> Result<Self::AsBytes, ...>
  3. A method on AsBytes to append a byte slice, at minimum. Maybe other ways of actually writing stuff in there, could include a byteorder::WriteBytesExt-style API but giving back musli-wire results.
  4. An .end() method on the AsBytes trait, that checks you have written the number of bytes you promised to and errors otherwise.

The final API would look like this:

// the encode_as_bytes implementation can reserve all the space required upfront
let slot = enc.encode_as_bytes(100)?;
for i in 0..20 {
    slot.append(b"12345")?; // would error if we went over 100 bytes total
}
slot.end()?; // would error if we wrote under 100 bytes total

One alternative would be to have a similar API but called PackedSequence. It would have a type parameter (via GAT type params e.g. type PackedSequence<T: Encode<M>>: PackedSequenceEncoder<T>) and you could write a &[T] in there or one &T at a time. It might need a trait for packed structs with a statically known size, which would be implemented for f32 and many other primitive types. It would need to be constrained somehow to only work with fixed length int/float encoding etc. Sounds a lot more complicated and less flexible. The encode_as_bytes has no more API surface than that and is a complete solution allowing almost any possible custom packing.

The only limitation of the encode_as_bytes idea is that you do need to know the encoded size upfront. I guess dynamically sized packed types are already solved by using #[musli(packed)] up to the length limit. If you really want a huge dynamically sized one I guess you can just up the limit and increase the size of all the packed length fields.

I like the idea for a bytes encoder which you have to specifically specify size up front, I personally never had a use for it but it seems like you do so I'd be happy to incorporate it.

It makes it compatible with the potential allocator-like methods I want to add to Context (for flexible no_std support) and means that prefix-wire formats can write the lengths up front. This was in relation to me considering dynamically sized packing too. The current implementation in e.g. wire encoder which always uses a FixedBytes on the stack isn't ideal.

Great suggestions overall, and thank you for sharing your thoughts about the topics!