dtolnay/serde-yaml

Deserialize empty file to struct containing no required fields

forbjok opened this issue · 6 comments

... even though there are no non-optional fields in the struct.
Or any fields at all, for that matter.

Attempting to deserialize an empty file (or a file containing only newlines) produces the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: EndOfStream', /checkout/src/libcore/result.rs:916:4

IMO deserializing an empty file should succeed as long as there are no required fields in the struct.

(I'm using serde_yaml 0.7.3)

Thanks! I changed the title because I was not able to reproduce the panic -- probably the panic is coming from an unwrap() in your code.

I would welcome a PR to support deserializing an empty YAML input to a struct containing no required fields.

A minimal test case:

fn main() {
    let value: serde_yaml::Value = serde_yaml::from_str("").unwrap();
    println!("value = {:?}", value);
}

Should result in a YAML null, I think (the spec isn't too clear here, IMO), but instead panics. (PyYAML, for comparison, returns None/null.)

In my case, the file isn't literally empty, but contains only comments.

I tried to coerce the value by doing from_str("!!map") … PyYAML errors on that (it wants a mapping to follow the !!map) but it parses in serde_yaml … to a String(""), which also seems wrong.

I'd like to write a PR for this, but I'm not familiar with serde internals. I added some tests to get started.

In the case of a completely empty document (no --- document start) the event loader will be totally empty. Looking at the code it looks like src/de.rs:1039 is the cause for failure for a completely empty document. But I'm not sure how to detect whether or not the type we're deserializing T is tagged as #[serde(default)], if that makes any sense. Any advice would be appreciated.

So I've looked into this too and I have a bit more information on the matter. The #[serde(default)] annotation is handled by serde and we don't have access to a restriction like T: Default either of course. So my understanding is that we'd have to step through the deserializers in a way that constructs a proper, but empty, value.

The deserializer called first for an empty file is deserialize_map, so that's already nice since a map can be empty (in theory). Of course we can't just construct an empty value since there's only access to V::Value from the visitor, no concrete type. But it turns out patching this line to return None on end_of_stream will help out a bit.

Since this is an interruption in an active map, that means that end_mapping will fail though, but it's simple enough to temporarily disable that error here.

At this point it's possible to just call return self.visit_mapping(visitor); whenever self.next() returns an error in deserialize_map and I think that should have successfully created an empty mapping. The tests all still work too, but the problem still isn't solved.

And this is where I don't quite understand what's going on. Next the deserialize_any method is called, but I'm not sure why. And making any significant changes here just blows everything up.

Obviously the other changes also change serde_yaml's behavior, so it would for example not error correctly anymore when reaching EOS in the middle of a mapping. So I'm relatively certain this is the wrong approach, but there's just enough indirection that the solution isn't quite clear to me.

@dtolnay are you aware of which are of code this change is likely required to touch? I've seen that toml which handles things properly seems to take a completely different approach to serde (it uses deseralize_any to just create a map for everything) so hopefully this can be solved without a complete rewrite? From the code I've touched it seems like things just blow up all over the place with an empty stream.

jrray commented

It would be helpful for us if at least the error raised when an empty document is detected (if I'm interpreting this code correctly) was a different error type and distinguishable from other unexpected EOF cases.

This is fixed in 0.9.

use serde::Deserialize;

#[derive(Deserialize, Debug)]
struct Struct {
    optional: Option<String>,
}

fn main() -> Result<(), serde_yaml::Error> {
    let value: Struct = serde_yaml::from_str("")?;
    println!("{:#?}", value);
    Ok(())
}
Struct {
    optional: None,
}