Feature request: variant of encode_bytes that lets you avoid allocating your own buffer for conversion
cormac-ainc opened this issue · 2 comments
Continuing on from #24 but off-topic for that issue: Most of the floats in this float-heavy format will be done using custom Encode/Decode impls on newtyped Vec<f32>
or Vec<Point3>
. Doing it by hand with byteorder::NetworkEndian
is about 10x faster at encoding and 14x faster at decoding than the Sequence-based implementation on Vec<f32>
with variable lengths, and still about 2.3x/3x faster with fixed lengths. (This is on a little-endian box.) Those are pretty significant margins and formats can be designed to chunk all the floats together in big vecs, which helps capitalise on LLVM having a highly optimised and vectorised encoder and decoder. It also makes skipping over all that data extremely quick. A few megabytes is typical for us.
The only pain point is having to either allocate a Vec<u8>
or use a thread_local!
buffer for the encode
impls. I couldn't get musli to give me an encoder for writing raw bytes if I knew the encoded length upfront but didn't have the final byte slice handy. That seems like an opportunity for a really useful new API.
I think it could look a bit like this:
- A new type param on the encoder trait,
type AsBytes: AsBytesEncoder
- A method on the Encoder trait,
fn encode_as_bytes(self, length: usize) -> Result<Self::AsBytes, ...>
- A method on AsBytes to append a byte slice, at minimum. Maybe other ways of actually writing stuff in there, could include a
byteorder::WriteBytesExt
-style API but giving back musli-wire results. - An
.end()
method on the AsBytes trait, that checks you have written the number of bytes you promised to and errors otherwise.
The final API would look like this:
// the encode_as_bytes implementation can reserve all the space required upfront
let slot = enc.encode_as_bytes(100)?;
for i in 0..20 {
slot.append(b"12345")?; // would error if we went over 100 bytes total
}
slot.end()?; // would error if we wrote under 100 bytes total
One alternative would be to have a similar API but called PackedSequence. It would have a type parameter (via GAT type params e.g. type PackedSequence<T: Encode<M>>: PackedSequenceEncoder<T>
) and you could write a &[T]
in there or one &T
at a time. It might need a trait for packed structs with a statically known size, which would be implemented for f32 and many other primitive types. It would need to be constrained somehow to only work with fixed length int/float encoding etc. Sounds a lot more complicated and less flexible. The encode_as_bytes
has no more API surface than that and is a complete solution allowing almost any possible custom packing.
The only limitation of the encode_as_bytes
idea is that you do need to know the encoded size upfront. I guess dynamically sized packed types are already solved by using #[musli(packed)]
up to the length limit. If you really want a huge dynamically sized one I guess you can just up the limit and increase the size of all the packed length fields.
I like the idea for a bytes encoder which you have to specifically specify size up front, I personally never had a use for it but it seems like you do so I'd be happy to incorporate it.
It makes it compatible with the potential allocator-like methods I want to add to Context
(for flexible no_std support) and means that prefix-wire formats can write the lengths up front. This was in relation to me considering dynamically sized packing too. The current implementation in e.g. wire encoder which always uses a FixedBytes on the stack isn't ideal.
Great suggestions overall, and thank you for sharing your thoughts about the topics!