dtolnay/serde-yaml

[Bug] Newtype enum variant deserialization failure when used with `#[serde(flatten)]`

cyqsimon opened this issue · 9 comments

TL;DR

If you create an enum that contains a newtype variant, wrap the enum in a struct, wrap the said struct in another struct and use #[serde(flatten)] on it, the struct can no longer successfully deserialize (while serialization is unaffected).

Deserialization fails with message: untagged and internally tagged enums do not support enum input.


That's not a very helpful description so here's a MRE:

use serde::{Deserialize, Serialize};

fn main() {
    // this works
    let gender = Gender::Other("F22".into());
    assert_eq!(
        &gender,
        &serde_yaml::from_str(&dbg!(serde_yaml::to_string(&gender).unwrap())).unwrap()
    );

    // this still works
    let info = Info { gender };
    assert_eq!(
        &info,
        &serde_yaml::from_str(&dbg!(serde_yaml::to_string(&info).unwrap())).unwrap()
    );

    // this errors
    let user = User { info };
    assert_eq!(
        &user,
        &serde_yaml::from_str(&dbg!(serde_yaml::to_string(&user).unwrap())).unwrap()
    );
}

#[derive(Clone, Debug, Eq, PartialEq, Deserialize, Serialize)]
enum Gender {
    Female,
    Male,
    Other(String),
}

#[derive(Clone, Debug, Eq, PartialEq, Deserialize, Serialize)]
struct Info {
    gender: Gender,
}

#[derive(Clone, Debug, Eq, PartialEq, Deserialize, Serialize)]
struct User {
    #[serde(flatten)]
    info: Info,
}

Misc

I tried the exact same example using serde_json, no errors. So I think this is a serde_yaml issue rather than a serde issue.

A quick search did find this issue though, not sure if it's related: ron-rs/ron#217.

Versions

  • OS: Linux 6.0.8 x86_64
  • rust: 1.65.0 (stable-x86_64-unknown-linux-gnu)
  • serde: 1.0.147
  • serde_yaml: 0.9.14

I just experimented with different enum representations. The results may be interesting for diagnosis:

With #[serde(untagged)]

Deserialization of Gender::Other now works. But obviously I can't actually use this because Gender::Male and Gender::Female both serialize to null.

With #[serde(tag = "foo")]

Serialization fails with message: cannot serialize tagged newtype variant Gender::Other containing a string.

This is expected because newtype by definition does not have a tag.

With #[serde(tag = "foo", content = "bar")]

This no longer errors, but of course it's very undesirable.

To add some maybe helpful information: This was introduced with the !Tag syntax of 0.9. Using serde_yaml with 0.8 works perfectly fine with #[serde(flatten)].

luben commented

To add some maybe helpful information: This was introduced with the !Tag syntax of 0.9. Using serde_yaml with 0.8 works perfectly fine with #[serde(flatten)].

Also if you serialize with 0.8 (it does not use tags) you can deserialize with 0.9, but an item serialized with 0.9 cannot be deserialized with neither.

Same problem here. Had not expected that serde_yaml's Serializer would use a format that serde_yaml's Deserializer cannot parse, but this is exactly what is happening here.
Have downgraded our project to serde_yaml 0.8.26 for now.

CC @dtolnay

Your "F-22" example resembles a transphobic joke. Consider changing it.

Your "F-22" example resembles a transphobic joke. Consider changing it.

No. PC's gone mad these days.

[Edit] rant deleted.

I am seeing this issue as well with the flatten and deserialization of enums.

I am seeing this issue as well with the flatten and deserialization of enums.

It seems that inside a field marked with #[serde(flatten)], the expected format for externally tagged enums changes to what serde_json uses:

use indoc::indoc;
use serde::Deserialize;
use std::{collections::HashMap, fmt::Debug};

#[derive(Deserialize, Debug)]
enum Inner {
    A,
}

#[derive(Deserialize, Debug)]
struct Outer {
    #[serde(flatten)]
    flattened_field: HashMap<String, Inner>,
}

fn main() {
    // doesn't work (as expected)
    println!("{:?}", serde_yaml::from_str::<Inner>(indoc! {"
        A: null
    "}));
    // works (as expected)
    println!("{:?}", serde_yaml::from_str::<Inner>(indoc! {"
        !A
    "}));
    // works (bug)
    println!("{:?}", serde_yaml::from_str::<Outer>(indoc! {"
        blah: { A: null }
    "}));
    // doesn't work (bug)
    println!("{:?}", serde_yaml::from_str::<Outer>(indoc! {"
        blah: !A
    "}));
}

This seems to be caused by #[serde(flatten)] generating code that goes through serde::__private::de::content::Content for both keys and values before deserializing to the final types present in the hashmap. When serde_yaml sees the !A, it calls .visit_enum() on the corresponding serde::[...]::ContentVisitor, which results in the "untagged and internally tagged enums do not support enum input"-error that occurs in cyqsimon's original code.

I'm not sure how to fix this. Ideally the code generated by #[serde(flatten)] would directly deserialize to the desired types, or it could go through serde_yamls's Value instead of their Content, but we can't change that from here (and they probably have a good reason for that intermediate type). We could also handle !A [...] the same as A: [...] (allowing both to deserialize to an externally tagged enum), which would fix this and make serde_yaml more compatible with json generated by serde_json. But that would probably break existing custom visitors that expect !A to result in a call to .visit_enum instead of .visit_map.

I hope some of this could help. I'd like to fix this myself, but I don't know enough about serde/rust to come up with a good fix.