dtolnay/serde-yaml

double quotes lost when deserializing and serializing strings containing only numbers on serde_yaml 0.9

Closed this issue ยท 6 comments

I'm reading some yaml, editing and then saving it again after doing to_string.
For some reason I'm not understanding, some String values which only contain numbers end up not having the double quotes after to_string.
I'm feeding this yaml to something that expects Strings, not numbers, and I'm expecting serde_yaml to respect whichever type the value was.
Here's a minimal reproduceable test, which passes if I use serde_yaml 0.8.26 but does not pass if I use 0.9:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6de99a49da249819a76e5fcd25342ec3

extern crate serde;
extern crate serde_yaml;


#[test]
fn can_serialize_with_quotes() {
    use serde_yaml::Mapping;
    let original_config = r#"---
configuration:
  agent: "007"
"#;
    let config: Mapping = serde_yaml::from_str(original_config).expect("should parse yaml");
    let config = serde_yaml::to_string(&config).expect("should serialize");
    assert_eq!(config, original_config);
}

basically this input:

configuration:
  agent: "007"

ends up as:

configuration:
  agent: 007

(there is also another difference between 0.8 and 0.9 regarding having or not having ---\n at the beginning of the String, but this is not relevant as far as I can tell; the issue is that a String value ends up being converted to a Number value).

Is there any way to avoid this String->Number conversion or some other way to make this test pass?

This is behaving correctly as far as I can tell. 007 is a !!str in yaml, not a !!int. If a different library you are using is interpreting it as an int, that is a bug in the other library.

Here is the spec section that determines what untagged scalars are int: https://yaml.org/spec/1.2.2/#1022-tag-resolution.

sorry, I should have checked this properly. I think you're right. thanks!

I'm not sure I agree with this resolution. @dtolnay any chance you can take another look?

From the linked article I see that:

Scalars with the โ€œ?โ€ non-specific tag (that is, plain scalars) are matched with an extended list of regular expressions.

One of which is [-+]? [0-9]+, which resolves to tag:yaml.org,2002:int (Base 10).

I think 007 is a plain scalar and matches the above regular expression, so it should be interpreted as an int. Therefore the round-trip of "007" to 007 is not valid, and parsing 007 as an integer is valid.

I've tested this with rust-yaml, yq, and PyYaml, which all agree so far that 007 is an int.

E.g.

use serde_yaml; // 0.9.24
use yaml_rust;  // 0.4.5

fn main() {
    use serde_yaml::Mapping;

    let original_config = r#"---
agent: "007"
"#;

    let parsed_serde_yaml: Mapping = serde_yaml::from_str(original_config).unwrap();

    // serde_yaml knows it's a string
    assert_eq!(parsed_serde_yaml["agent"], serde_yaml::Value::String("007".into()));
    let serialized_serde_yaml = serde_yaml::to_string(&parsed_serde_yaml).unwrap();

    // Serializes it back to 007, no quotes.
    assert_eq!(serialized_serde_yaml, r#"agent: 007
"#);

    // serde_yaml parses it back as a string, so we're self-consistent.
    let parsed_serde_yaml: Mapping = serde_yaml::from_str(&serialized_serde_yaml).unwrap();
    assert_eq!(parsed_serde_yaml["agent"], serde_yaml::Value::String("007".into()));

    // But yaml_rust parses it as an integer
    let parsed_yaml_rust = yaml_rust::YamlLoader::load_from_str(&serialized_serde_yaml).unwrap();
    let doc = &parsed_yaml_rust[0];
    // thread 'main' panicked at 'assertion failed: `(left == right)`
    // left: `Integer(7)`,
    // right: `String("007")`',
    assert_eq!(doc["agent"], yaml_rust::Yaml::String("007".into()));
}

Yeah, good call.

Fixed in serde_yaml 0.9.25.

I'm getting similar behaviour using 0.9.26 but not in 0.8.26 where serializing and then deserializing results in an unquoted string as output.

The input:

          - name: KUBERNETES
            value: "yes"

Output:

          - name: KUBERNETES
            value: yes

Having a similar issue with a key containing a string with the single character y. Even if a create the mapping value as a string, when outputing the yaml, y is not quoted. This ends up being interpreted as a bool instead of a string value by kubectl.

Reference: https://yaml.org/type/bool.html

Question: Is there a way of forcing values of type string to be always quoted when outputing as a string?