3Hren/msgpack-rust

Can't deserialize entire file

StuartHadfield opened this issue · 8 comments

I can't deserialize an entire file because the Deserializer does not implement into_iter as other serde libraries do.

How can I get around this?

Code thus far is:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file_path = "./src/foo.msgpack";
    let reader = BufReader::new(File::open(file_path).unwrap());
    let writer = BufWriter::new(File::create("./src/results.json").unwrap());

    let mut deserializer = rmp_serde::Deserializer::from_read(reader);

    // let mut serializer = serde_json::Serializer::new(io::stdout());
    let mut serializer = serde_json::Serializer::pretty(writer);

    serde_transcode::transcode(&mut deserializer, &mut serializer).unwrap();
    serializer.into_inner().flush().unwrap();

    Ok(())
}

How can I get around this?

Make a PR that adds into_inner

@kornelski 🤔 do you mean into_iter, not into_inner?

(I'm happy to have a bash, but I'm a real newbie to Rust, so not sure I'll manage haha)

I assume you mean into_inner, because Iterator doesn't make sense here.

Ah... Hmmm 🤔 What does into_inner look like?

I thought making an iterator - because that seems to be how Python's msgpack implementation works (https://github.com/msgpack/msgpack-python/blob/500a238028bdebe123b502b07769578b5f0e8a3a/msgpack/_unpacker.pyx#L539-L540).

into_inner conventionally just returns the wrapped object, right? So we'd return the Reader? Which means we can...?

Also - into_inner is already implemented for Deserializer

In that case I'm completely confused about what you want.

Serde fundamentally creates a single object of a given type. There is nothing to iterate in the decoder. Even if you deserialize a vector, you iterate the vector, not the decoder.

I thought you meant into_inner that returns the io::Reader so that you can recycle it for other I/O operations. That's not related to iteration.

Ah - okay - let me clarify.

If you have serialized the following array of objects into msgpack:

{
  "foo": "bar"
},
{
  "lorem": "ipsum"
}

We should be able to read all of them - out of a file stream. However, once serde_rmp reaches the end of the first object (probably some delineating character?), it concludes decoding, despite the fact there's loads of information still to be read out of the buffer. You can actually see this if you print out the bytes read by fs::read vs what's decoded by rmp_serde.

I thought about into_iter after seeing it in the json implementation of serde - https://docs.rs/serde_json/latest/serde_json/de/struct.Deserializer.html#method.into_iter.

Does that make any more sense @kornelski ?

I don't think that's a correct usage of serde. Serde is a type-based one-shot deserializer, not a streaming deserializer. It gives you one and exactly one object of the type you've requested. If you've requested a single struct, that's all you will ever get. Two objects next to each other is not a type. If you have multiple objects to deserialize with serde, the deserialize them all into a single Vec<Object>.