ron-rs/ron

Untagged enum roundtrip issues

juntyr opened this issue · 3 comments

The following cases do not roundtrip:

  • untagged struct variant with PrettyConfig::default().struct_names(true), since
  • raw value inside an untagged enum
    • raw values are serialized without any indication that they exist, as is their purpose
    • untagged enums deserialize into serde's content type, which will parse the raw value's content as normal RON
    • the raw value cannot be deserialized from deserialized ron
    • e.g. "\"hi\"" is the raw value "\"hi\"" but the serde content "hi"
    • this could only be solved with a custom content type, which is blocked serde-rs/serde#1183

I have given this a bit more thought. The untagged struct variant issue is caused by

  • ron has optional struct names but required variant names
  • ron does not distinguish between struct and variant names in the format
  • ron thus has to assume any named item is a variant inside serde Content's deserialize_any

We could technically fix this with significant changes to ron, e.g. making the struct_names format option a explicit_type_names extension instead which forces struct names to be serialised and changes variants to a Enum::Variant form instead. During deserialising and if the extension is enabled, we would then only report an enum to serde's Content type if it is in Enum::Variant form. Without the extension enabled, we would proceed as now, i.e. assume that any named item is a variant.

While possible, I don't think that this is a very sustainable direction for ron to take. After all, optional struct names, required variant names, and no enum names are specific features of the format, none of which cause any problems when deserialize_any is not involved.

In the end, format-specific buffering support in serde would fix this and many other issues. Unfortunately, even though this could be implemented in serde today (see serde-rs/serde#1354), dtolnay has indicated that serde will instead wait for associated type default to be stabilized, which might be quite a while.

I'm not quite sure where we go from here. Do we just document these limitations (and publish v0.9 with them)? Do we change the ron format? Do we unionise with other serde formats and demand a timeline for format-specific buffering /hj?

@torkleyy

I would leave the format as-is and document it, this seems like a very specific problem and we can't do anything about the root cause (other than waiting).

I would leave the format as-is and document it, this seems like a very specific problem and we can't do anything about the root cause (other than waiting).

Ok, that sounds reasonable! Since discovering and fixing more issues with serde's enum representations is definitely a longer project (see https://github.com/juntyr/ron/tree/fuzz-untagged for a rough start), I'd propose to push that after the v0.9 release. For now, I'll document what we know to be problematic, e.g. "ron strives to support most #[serde(untagged)] use cases. However, the following cases are known to not roundtrip: ... . Please file an issue if you come across a use case which is not listed here but still breaks."