ron-rs/ron

RON incorrectly adds extra level of Base64 when roundtripping Bytes

GoldsteinE opened this issue · 2 comments

This code (playground):

#[derive(Debug, serde::Deserialize, serde::Serialize)]
#[serde(rename = "b")]
struct BytesVal {
    pub b: bytes::Bytes,
}

#[derive(Debug, serde::Deserialize, serde::Serialize)]
#[serde(untagged)]
enum Bad {
    Bytes(BytesVal),
}

fn main() {
    let v: Bad = ron::from_str(r#"(b: "dGVzdA==")"#).unwrap();
    dbg!(&v);
    let s = ron::to_string(&v).unwrap();
    dbg!(&s);
    let v: Bad = ron::from_str(&s).unwrap();
    dbg!(&v);
    println!("---");
    let v: Bad = serde_json::from_str(r#"{"b": "dGVzdA=="}"#).unwrap();
    dbg!(&v);
    let s = serde_json::to_string(&v).unwrap();
    dbg!(&s);
    let v: Bad = serde_json::from_str(&s).unwrap();
    dbg!(&v);
}

has the following output:

[src/main.rs:23] &v = Bytes(
    BytesVal {
        b: b"dGVzdA==",
    },
)
[src/main.rs:25] &s = "(b:\"ZEdWemRBPT0=\")"
[src/main.rs:27] &v = Bytes(
    BytesVal {
        b: b"ZEdWemRBPT0=",
    },
)
---
[src/main.rs:30] &v = Bytes(
    BytesVal {
        b: b"dGVzdA==",
    },
)
[src/main.rs:32] &s = "{\"b\":[100,71,86,122,100,65,61,61]}"
[src/main.rs:34] &v = Bytes(
    BytesVal {
        b: b"dGVzdA==",
    },
)

It fully survives the roundtrip with serde_json, but adds extra level of Base64 with RON.

This really is a tricky one! I think though that I now understand what is going on here. The problem is the #[serde(untagged)] which means that the ron string is deserialised without having any type information available. Since ron encodes bytes as base64 encoded strings, but without declaring that they are bytes specifically, the deserialiser just sees a string when no type information is available. When bytes::Bytes then deserialises, it sees a string and just gets its bytes. I.e. even though they were base64 bytes, at no point was ron asked to turn them back into bytes.

ron in general is a format that does not support deserialising without any type information (i.e. anything that touches deserialize_any). However, I agree that this is a very confusing and unfortunate case. The only way of fixing this (to work with untagged enums) that I can think of is to make a breaking change to ron's format in how bytes are represented. It could be as simple as prefixing any byte string with b, i.e. b"base64" during serialisation. During deserialisation, we could continue to accept a string without the b-prefix if type information is available. However, old ron code would no longer be able to read files produced by new code. Those would be tough decisions to make ...

@torkleyy What are your thoughts on this?

I think having special support built-in for bytes / base64 encoding can be an advantage, even if it means breaking some compatibility. I'm not sure if we can ease the transition somehow to not cause any breakage in practice.