toml-lang/toml

Can comments appear before commas in in-line arrays?

Validark opened this issue ยท 12 comments

Arrays can span multiple lines. A terminating comma (also called trailing comma) is ok after the last value of the array. There can be an arbitrary number of newlines and comments before a value and before the closing bracket.

My question is this: Can comments appear before commas in in-line arrays?

integers3 = [
  1,
  2 # is this a valid toml document?
  , 3
]

According to the ABNF file, the above is not a valid toml document.

However, tomlplusplus parses the above document just fine: https://godbolt.org/z/f6MG38

This is a good question. Given that I've recently verified toml++'s compliance against the two canonical TOML compliance test suites (BurntSushi's and iarna's), this implicitly means that any other parser passing those might still have a discrepancy here as well.

My hope is that this part of the spec is just a case of unclear wording and this should actually be valid, as it would be in just about any other language with single-line comments. As far as the parser is concerned it's much simpler to allow this than to prohibit it. It's also not unheard-of for people to use leading commas instead of trailing commas when splitting things over multiple lines, as odd as it looks.

I'm inclined to agree with @marzer here. SQL coders tend to use preceding commas in select clauses, and I've written enough SQL to understand why.

The ABNF would be easy to change. Replace ws with ws-comment-newline on two lines, then shorten up the lines with further refinement. The language of the spec is oddly specific: comments are allowed before values, and before closing brackets. That ought to be changed so that comments (with newlines) are allowed between values, commas, and brackets

I agree, every major language I'm aware of allows this. The spec likely just forgot to mention this case. Perhaps it should instead say something like:

"Arrays can span multiple lines. A terminating comma (also called trailing comma) is permitted after the last value of the array. Any number of newlines and comments may precede values, commas, and the closing bracket."

As it stands, the spec (both written as ABNF) is unambiguous: newlines and comments may appear before values and before the closing bracket, BUT NOT before commas.

This also means that this (odd, but occasionally used) style is forbidden even in the absence of comments:

# THIS IS (as of now) NOT A VALID TOML DOC!
values = [
    1
  , 2
  , 3
]

I suppose this feature was modeled after TOML's key/value pairs โ€“ values must start on the line where the corresponding key ends, with no line breaks or comments allowed before or after the equals sign. Array elements and the commas that follow them were seen as belonging together in the same way. It makes certainly sense for key/value pairs, but for array elements?? Personally, I'm inclined to follow the other commenters here and plead for the relaxation of this restriction. As long as a comma appears, it should not matter whether it's preceded or followed by line breaks and comments.

This means I'm in favor of changing the ABNF as suggested by @eksortso and the written spec as suggested by @Validark.

One thing to consider, though: this is a nontrivial change, since it will force all compliant parsers to modify their behavior (allowing things that were formerly forbidden). RC 2 was recently released and I had hoped it could be promoted to 1.0 Final without further changes. But this change means that an RC 3 is called for before we can hope for Final. Well, I guess it can't be helped...

One thing to consider, though: this is a nontrivial change, since it will force all compliant parsers to modify their behavior (allowing things that were formerly forbidden).

I believe most spec-compliant parsers are likely to already allow this behavior. The spec NEEDS to be updated one way or the other. If newlines should not be allowed before commas in TOML documents, that would need to be added to the spec so implementations actually know they need to ban this. As it sits, I am guessing people like @marzer just assumed the case being discussed here is fine, as it should be.

I'm writing this comment because the commit is linked to here. It is not merged yet but seems to work fine.

Actually, the current version of toml11, another C++ implementation, does not allow this. It only allows whitespaces, as described in the ABNF. There could be more implementations that are aware of this case. As @ChristianSi pointed out, the current spec does not allow comments before a comma explicitly and the toml.abnf clearly inhibit this. So currently there is no reason to allow it. Although the toml.abnf says it is not authoritative yet, it is so unambiguous, any implementation that refers to it could check this case.

I'm rather positive to this change because it looks legal. But there could be some implementations that does not allow this currently.

As it sits, I am guessing people like @marzer just assumed the case being discussed here is fine, as it should be.

Yup, guilty as charged. I'm very surprised this wasn't legal.

PR #767 is introduced to update the specification. I don't know if it's worth adding examples to toml.md to make this clear, but my intuition say it's not.

I agree, every major language I'm aware of allows this. The spec likely just forgot to mention this case. Perhaps it should instead say something like:

"Arrays can span multiple lines. A terminating comma (also called trailing comma) is permitted after the last value of the array. Any number of newlines and comments may precede values, commas, and the closing bracket."

Hope you don't mind, @Validark, but I used your exact text in the PR. It was too perfect to rewrite.

@eksortso Feel free!

Whee! Nice catch @Validark! ^>^

I think it definitely makes sense to allow this as valid TOML.

Since #767 is merged, we can now say yes to the original question. @Validark, if you're happy with the update, cold you close this issue?