ruby/psych

Quoting strings y/n when dumping?

tisba opened this issue ยท 15 comments

tisba commented

Hey there ๐Ÿ‘‹

I'm trying to wrap my head around booleans and YAML. We encountered the issue, when we tried to parse a YAML back generated by YAML.dump().

Environment: Ruby 2.7.0p0,

Example:

{"n" => "y"}.to_yaml

generates

---
n: y

When we parse this e.g. using https://github.com/go-yaml/yaml it parses as (written as Ruby):

{ false => true }

When I look at https://yaml.org/type/bool.html, I would assume that the expected generated YAML should be more like this a String rendered:

---
'n': 'y'

Am I missing something?

Some more experiments, in case that's any help, show that y, Y, n and N are not quoted as required:

# taken from https://yaml.org/type/bool.html
v = %w(y Y yes Yes YES n N no No NO true True TRUE false False FALSE on On ON off Off OFF)

v.each { |x| puts YAML.dump(x) }
# --- y
# --- Y
# --- 'yes'
# --- 'Yes'
# --- 'YES'
# --- n
# --- N
# --- 'no'
# --- 'No'
# --- 'NO'
# --- 'true'
# --- 'True'
# --- 'TRUE'
# --- 'false'
# --- 'False'
# --- 'FALSE'
# --- 'on'
# --- 'On'
# --- 'ON'
# --- 'off'
# --- 'Off'
# --- 'OFF'

According to the YAML spec, keys can be non-strings:

Since YAML mappings require key uniqueness, representations must include a mechanism for testing the equality of nodes. This is non-trivial since YAML allows various ways to format a given scalar content. For example, the integer eleven can be written as โ€œ013โ€ (octal) or โ€œ0xBโ€ (hexadecimal). If both forms are used as keys in the same mapping, only a YAML processor which recognizes integer formats would correctly flag the duplicate key as an error.

As you can see here, mapping keys can be integers (and therefore, they can also be booleans, etc.).

If you want a string, you must quote it:

Psych.load("---\n1: 2\n")            # => {1=>2}
Psych.load("---\n'1': '2'\n")        # => {"1"=>"2"}
Psych.load("---\ntrue: false\n")     # => {true=>false}
Psych.load("---\n'true': 'false'\n") # => {"true"=>"false"}

This is all functioning correctly, per the spec.

Now, in YAML 1.1, a boolean can be y|Y|yes|Yes|YES|n|N|no|No|NO |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF.

In YAML 1.2, a boolean can only be true|false (case-insensitive).

Psych only allows YAML 1.1, so actually it's broken and produces a string:

# Incorrect according to spec!
Psych.load("%YAML 1.1\n---\nn: y\n") # => {"n"=>"y"}

# Correct
Psych.load("%YAML 1.1\n---\nno: yes\n") # => {false=>true}
Psych.load("%YAML 1.1\n---\non: off\n") # => {true=>false}

So actually there's a minor bug in Psych.

Anyway, to fix your issue, you must single/double quote the key, per the YAML spec.

Or, you could use YAML 1.2 for go-yaml, but this will raise an error with Psych:

%YAML 1.2
---
n: y

Forgot to talk about dump, which is also broken:

# Broken! Should add quotes!
puts ({'y'    => 'n'    }).to_yaml # => y: n

puts ({'yes'  => 'no'   }).to_yaml # => 'yes': 'no'
puts ({'true' => 'false'}).to_yaml # => 'true': 'false'
puts ({true   => false  }).to_yaml # => true: false

I tried fixing it in a hacky way, but doesn't work:

require 'psych'

class NotBool < String
  def encode_with(coder)
    coder.scalar = self
    #coder.scalar = "'#{self}'" # This doesn't work either
    coder.style = Psych::Nodes::Scalar::SINGLE_QUOTED
    coder.tag = nil
  end
end

# Doesn't add quotes
puts ({NotBool.new('y') => NotBool.new('n')}).to_yaml

The only way I could get it to work is the super hacky way:

require 'yaml'

x = {'_notbool_y_notbool_' => '_notbool_n_notbool_'}

puts x.to_yaml.gsub('_notbool_',"'")

In short, never dump a String containing only y or n until it's fixed.

According to test_boolean.rb (test_y and test_n), I guess this isn't a bug. It says that Syck treats y/n as strings on purpose.

However, IMO, that's broken and not following YAML 1.1 spec, and therefore makes potentially incompatible YAML files for other YAML parsers to consume. It also seems to be bug to not allow adding quotes using the coder/encode_with.

It looks pretty easy to change the Ruby code though. Maybe I'll write a pull request and the Psych team can decide if want it or not.

tisba commented

@esotericpig The problem in our case is, that part of the data we serialise is user-provided. So we cannot "not" support "y", "n" etc.

The workaround for us was in this case to update the Golang library which in the newer version only supports YAML 1.2.

The solution before was what you did: Generate a unique string and replace it in the output. That's actually very messy, but it works.

I could understand when this won't be accepted as a general change, because it could lead to problems with applications relying on this buggy behaviour. But at least a "strict" flag or something like that for dump and load would be nice.

tisba commented

Oh and actually, that's also not following the YAML 1.0 spec. "y" should has to be encoded as a string while true maybe encoded as y (without quotes).

tisba commented

I'm not sure I'm reading #448 correctly, @esotericpig. It looks like it only changes parsing YAML, not generating, right? So the actual issue won't be fixed?

Oh and actually, that's also not following the YAML 1.0 spec. "y" should has to be encoded as a string while true maybe encoded as y (without quotes).

According to YAML 1.1 spec, true and y without quotes are translated to the language booleans. And yes, "y" with quotes should be translated to just a string. My wording was probably confusing, sorry.

I'm not sure I'm reading #448 correctly, @esotericpig. It looks like it only changes parsing YAML, not generating, right? So the actual issue won't be fixed?

Even though I only changed one spot in the code, it affects both. I tested it again to make sure:

irb(main)> {running: 'y'}.to_yaml
=> "---\n:running: 'y'\n"
irb(main)> {running: true}.to_yaml
=> "---\n:running: true\n"

It correctly quotes the y. You'll never do this in code (unless it's a variable):

{running: y}.to_yaml

And here is parsing:

irb(main)> Psych.load("---\n:running: true\n")
=> {:running=>true}
irb(main)> Psych.load("---\n:running: y\n")
=> {:running=>true}
irb(main)> Psych.load("---\n:running: 'y'\n")
=> {:running=>"y"}

===

Unfortunately, it looks like I need to change the code to use a strict flag instead after code review, but not sure if I need to wait for pull/426 first.

This is similar to #387 where to_yaml doesn't quote things like some mac addresses. Parsers that correctly implement the (somewhat stupid) 1.1 spec then load them as integers.

In short, never dump a String containing only y or n until it's fixed.

Yes, we should put quotes around "y" and "n" when dumping (regardless of "strictness"). Loading seems a bit more tricky because people might be expecting the current behavior.

Looks like I did some work here related to #426. If someone could incorporate that in to a PR I would merge it.

simi commented

What about this case? Would it be possible to quote it as well?

YAML.dump('A' => '2020-12-31')
=> "---\nA: 2020-12-31\n"

YAML.load(YAML.dump('A' => '2020-12-31'))
=> Tried to load unspecified class: Date (Psych::DisallowedClass)
tisba commented

That is obviously a bug too. You should create a new dedicated issue for that.

simi commented

@tisba I'll try to prepare PR.

tisba commented

I cannot reproduce your issue with 4.0.3, @simi. This works fine for me

YAML.load(YAML.dump('A' => '2020-12-31'))
 => {"A"=>"2020-12-31"}
simi commented

I cannot reproduce your issue with 4.0.3, @simi. This works fine for me

YAML.load(YAML.dump('A' => '2020-12-31'))
 => {"A"=>"2020-12-31"}

I have found out the problem is related actually to result of Date.strptime('2020-12-31', '%F', Date::GREGORIAN). In some environments that can fail and since that string is not considered date, it is not properly quoted.

My problem was caused by https://github.com/travisjeffery/timecop/blob/864bcf16f70f31a8e125cc3135889067fdd2a373/lib/timecop/time_extensions.rb#L48-L51. Patching stdlib is wrong, I think there's nothing we can improve on Psych side actually.