3Hren/msgpack-rust

Is it possible to get the number of bytes read immediately after deserializing from a slice?

aalekhpatel07 opened this issue · 3 comments

I'm trying to use rmp_serde to send and receive entire messages (enums or structs) via a BytesMut buffer that gets populated and emptied by a different part of the system (in this case a TcpStream).

I don't have any custom framing setup so I'm wondering if I could use rmp_serde to tell me the exact size of the serialized representation of the data (i.e. the count of bytes it deserialized) immediately after it has successfully parsed a portion of the stream into the specified type.

If there already exists an approach I apologize for having missed it. Please feel free to point me in the right direction.

I'm picturing an API like:

/// Deserialize a slice into a deserializable data type and return a count of the bytes deserialized if the deserialization was successful.
pub fn from_slice_with_size<'a, T>(input: &'a [u8]) -> (Result<T, Error>, Option<usize>)
where
    T: Deserialize<'a>
{
    ...
}

Here's an example usage:

use serde::{
    Serialize, 
    Deserialize
};

#[derive(Serialize, Deserialize)
pub enum Foo {
    Bar(String),
    Baz
}

pub struct Container {
    pub buffer: BytesMut
}

impl Container {
    ...
    fn read_foo(&mut self) -> Result<Option<Foo>, Box<dyn std::error::Error>> {
        if let Ok(foo) = rmp_serde::from_slice(&self.buffer) {
            // Currently, to get the byte count I have to serialize it again.
            // This has to be slower than keeping track of the bytes deserialized
            // while deserializing.
            let bytes_serialized = rmp_serde::encode::to_vec(&foo)?.len();
            self.buffer.advance(bytes_serialized);
            Ok(Some(foo))
        }
    }

    /// This doesn't work because there is no `from_slice_with_size` method but 
    /// it'd be neat if there was something that keeps track 
    /// of and outputs the size of the bytes deserialized.
    fn read_foo_with_size(&mut self) -> Result<Option<Foo>, Box<dyn std::error::Error>> {
        if let (Ok(foo), Some(size)) = rmp_serde::from_slice_with_size(&self.buffer) {
            self.buffer.advance(size);
            Ok(Some(foo))
        }
    }
    ...
    // Some other part takes a `&mut self` and populates the buffer.
    fn fill_up_buffer(&mut self) {
        stream.read_buf(&mut self.buffer).unwrap();
    }
}

No. The serde API can only return the final result once it's 100% complete.

You could use the lower-level rmp to read individual items as they come. OR you could use something else around msgpack messages to split them into chunks (it could be as simple as sending <length><data> pieces over the stream).

Hi!
I am serializing a series of structs, and writing them individually to a binary file.
Could you please show which function in rmp::decode should I use to get the packed struct size? I tried several, and rmp::decode::read_map_lenseems to be the most suitable choice, however, it returns 1, which is clearly not the case.

And could you please elaborate on the 'using something else around msgpack messages'? Do you suggest just writing custom bytes after each struct, and then splitting the data by that divider?

Thanks in advance.

UPD: as a workaround, one can simply write the size of serialized struct before the actual data:

// encoding (pseudocode)
writer.write(size.to_bytes());
writer.write(serialized_bytes());

// decoding
while buf.len() > 0 {
  // firstly we read the size of packed data
  // here you might want to use big/little endian, not native one
  let size = usize::from_ne_bytes(buf[0..8].try_into().unwrap());
  // trim the buffer so that it starts with actual data
  buf = &buf[8..];
  // parse the serialized record
  let record = rmp_serde::from_slice::<Record>(&buf[..size]);
  // cut out the data
  buf = &buf[size..];
}