ron-rs/ron

Round-trip failure: untagged enum containing Vec variant is improperly deserialized

Closed this issue · 13 comments

Consider the following types:

#[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
#[serde(untagged)]
enum MyValue {
    Int(i64),
    String(String),
    Enum(Enum),
    List(Vec<MyValue>),
}

#[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
enum Enum {
    First(String),
    Second(i64),
}

Let's try serializing a MyValue::Enum value like the following: MyValue::Enum(Enum::First("foo".to_string())). Its RON-serialized value is First("foo").

That First("foo") value unfortunately deserializes to MyValue::List([MyValue::String("foo")]) instead of our original enum-based value. This is surprising to me, because the deserialization code decides to ignore the First prefix entirely, and skip forward until the ( then deserialize a list with it. It's especially surprising because the MyValue::Enum variant is before the MyValue::List variant, and yet we still get a MyValue::List back instead of MyValue::Enum.

Here's a complete repro:

#[cfg(test)]
mod tests {
    use serde::{Serialize, Deserialize};

    #[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
    #[serde(untagged)]
    enum MyValue {
        Int(i64),
        String(String),
        Enum(Enum),
        List(Vec<MyValue>),
    }

    #[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
    enum Enum {
        First(String),
        Second(i64),
    }

    fn serialize_then_deserialize(value: &MyValue) -> MyValue {
        ron::from_str(ron::to_string(value).unwrap().as_str()).unwrap()
    }

    #[test]
    fn test() {
        let value = MyValue::Enum(Enum::First("foo".to_string()));
        let deserialized = serialize_then_deserialize(&value);
        assert_eq!(value, deserialized, "Serialized as: {}", ron::to_string(&value).unwrap());
    }
}

This fails with:

running 1 test
thread 'file::tests::test' panicked at 'assertion failed: `(left == right)`
  left: `Enum(First("foo"))`,
 right: `List([String("foo")])`: Serialized as: First("foo")', src/file.rs:X:9

Thanks to everyone that contributes to building and maintaining this awesome library! Any advice on avoiding this issue would be much appreciated.

I'll take a look now

Ok, I think I can at least explain what's happening here. #[serde(untagged)] first deserializes into a private Content value type to buffer what has been found. Then all variants attempt to deserialize from that Content type in order, and the first one to succeed returns. The issue is that RON is more expressive than serde's data model, so when the Content type is created with deserialize_any, the information that First("foo") looks like a tuple struct with name First is lost and all that remains is the information that it looks like a tuple, i.e. sequence ("foo",). @torkleyy would this issue be fixed in your future reboot? I'm not sure if this can be fixed in RON as the encoding through serde isn't lossless (unlike for JSON), but I'll see if I can find something.

#253 might be related, I'll check if it would fix the issue

With #253 the test case would pass, but:

1. Unit variants now deserialize to Value::String instead of Value::Unit:

// No longer works
assert_eq!("Foo".parse(), Ok(Value::Unit));
// This is now the new result
assert_eq!("Foo".parse(), Ok(Value::String(String::from("Foo"))));

2. A potential bug with earlier untagged enum variants is fixed as the extraneous If tag is no longer accepted in:

BuildSystem(
    version: "1.0.0",
    flags: [
        "--enable-thing",
        "--enable-other-thing",
        If("some-conditional", ["--enable-third-thing"]),
    ]
)

3. Anything that looks named-struct-like now deserializes to a singular Value::Map with one key, the struct name, and one value, the inner content:

assert_eq!("Room(width: 20)".parse(), Ok(
    Value::Map(
        vec![(
            Value::String("Room".to_owned()),
            Value::Map(
                vec![(
                    Value::String("width".to_owned()),
                    Value::Number(Number::new(20)),
                )].into_iter().collect(),
            ),
        )].into_iter().collect(),
    )
);

4. Newtype structs no longer deserialize if going through Value. If the struct name is not mentioned, e.g. (a: 42), this can be fixed with a small change in #253. However, if the struct name is mentioned, e.g. Newtype(a: 42), the pass through Value no longer works as the extra map layer is not expected by the newtype struct deserialization.

@torkleyy @kvark What do you think about this tradeoff? I could try to reopen and refactor #253.

Maybe at least part of #4 could be solved by adding some extra functionality to Value when doing non-self-describing deserialization.

Ok, I played around with it a bit more to get a feel of what a rough implementation could look like. It's essentially #253 but with (a) a fix that makes unnamed struct-likes work again (though this fix requires a lot more though to ensure it actually works in all edge cases) and (b) special handling in the Value Deserializer impl to get all types of structs and enums to properly roundtrip through Value. Some of that is fixing issues that #253 introduces to get untagged enums to work, but I think getting enums to roundtrip through Value might be new? Most of the new Value code is adapted from serde internals for now and could be cleaned up a lot.

You can have a look here master...MomoLangenstein:253-untagged-enums to see roughly what would be changed.

I guess my summary is that

  • it is possible
  • only Value and deserialize_any are affected at all
  • the semantics of Value change a bit as anything struct-like including enums are now encoded in the JSON way serde forces on us if we want to use untagged enums (i.e. they are wrapped in a 1-element map that goes from name to content)
  • it requires a lot of extra code in the Value Deserializer but it should only add functionality

@MomoLangenstein has there been any further development on this issue?

@NiklasEi I haven't done any more work on this since. If there is interest, I could definitely have another look and see if I could resolve some of the issues I came across during the initial attempt. However, this method seems to require serde's unfortunate enum workaround in any case ... which is unfortunate given that ron is more expressive than json. So if we'd want this to work, it might require a tradeoff for deserialize_any and Value. @torkleyy @kvark What would be your thoughts on this? Would it be worth looking into further?

juntyr commented

Thank you @obi1kenobi for providing a very good test case - I just realised that #451 fixed the issue and have added a test to ensure it remains working.

My pleasure, thanks for the fix! 🙌