Colons in keys and string values
Opened this issue · 7 comments
Almost every docker-compose.yml
in the world has multiple lines like this:
image: postgres:9.3
A lot of Ruby projects contain YAML where strings begin with colons, which is abominable but a fact of life:
:foo: bar
I'm working on a GitLab CI file at the moment which has colons in the middle of keys:
foo:bar: "value"
All of these should be fixed as follows:
image: "postgres:9.3"
":foo": "bar"
"foo:bar": "value"
foo:bar: "value"
This is way way harder to disambguate
Actually, I mis-reported what the YAML looks like there - these are keys for sub-maps:
foo:bar:
a: 1
b: 2
Which ideally we'd fix like this:
"foo:bar":
a: 1
b: 2
Does that make it easier?
Yes and No?
:foo:bar:baz
is this ':foo: bar:baz
or :foo:bar: baz
If you can define the behaviour, I can implement it, but It's kinda regex dependent
At some point it stops being safeyaml and starts being YAML when we can't easily disambiguate keys
At the moment, how keys are handled, It scans for a string, in ""s or ''s, or an identifier-like pattern followed by a :
, then checks for a
.
For values, it tries to read a 'string without special characters', like :
etc. When fix-unquoted is given, it doesn't complain.
To quote daniel, "Barewords are a clusterfuck"
{a:1:b:2}
This parses under indentifier rules, and skips the
after the :
(iirc without one of the fix options), but if :
is allowed inside keys, well, why not ',' ?
(Aside: If someone uses '0x0FFF' as a number, it won't parse, because barewords currently exclude things with leading digits or +/- as being anything but numbers. )
The change here might also impact on nested keys name: name: foo
too.
On the other hand ...
So far, we do this sort of thing
- check for
-
a list indent, or[
,{
, - if
+-0123456789
then error out, numbers can't be keys or outside of[]
, or{}
's - if a
"
or'
, then scan a string, and expect ': ' or ':\n' - scan for an identifier followed by ':'
We could change that last one:
- split on the last ':' followed by a space or a newline (i.e rightmost)
- if there's more than one, or less than one, error
- if it's at end of line, expect an indented block
- otherwise, key is before ': '
But the weird bit comes into how we handle the value:
on: true/false
If we try parsing the value, we'd stop at true
.
And it impacts nesting: - name: value
means a -
has to do these checks too
This can also impact how things appear inside of {}
's, which sucks.
I do feel the whole point of safeyaml is 'stop writing something so ambigious in the first place', but I do feel like a repair option might be necessary
If it makes things easier, image: postgres:9.3
is probably the most common of these by far. Repairing :foo: bar
, foo:bar: <scalar>
, and foo:bar:\n<indented map/list>
is way less important imo.
From what I understand from the scanning logic, to repair that first case, we'd have to include :
in the acceptable characters when scanning after encountering <identifier><colon><whitespace>
?