libAtoms/extxyz

Ambiguity in the grammar between quoted strings and backward compatible arrays

Luthaf opened this issue · 6 comments

For example how should software interpret the value in key="a b c"? It could be either a string containing multiple spaces a b c or a backward compatible array of three strings (["a", "b", "c"]).

I think the easiest resolution of this ambiguity is to limit quoted arrays to only integer/real/boolean values, the same way '{' arrays are limited to string values. All the extxyz files I've seen in the wild use arrays in this way, so backward compatibility should be preserved. Do you agree @jameskermode?

The 1D-array section (https://github.com/libAtoms/extxyz#one-dimensional-array-vector) of the readme says

backward compatible: opens with " or {, one or more of the same primitive types (strings only in {}) separated by whitespace, ends with matching " or }. For backward compatibility, a single element backward compatible array is interpreted as a scalar of the same type.

And "primitive types" allow strings, so that's how I understood this section. The pyleri grammar does not allow string

old_one_d_array = Choice(Sequence('"', Choice(ints_sp, ints, floats_sp, floats, bools_sp, bools), '"'),
so I guess the main thing to update is the documentation!

Although, looking at the code the next line in pyleri grammar says

Sequence('{', Choice(ints_sp, ints, floats_sp, floats, bools_sp, bools, strings_sp, strings), '}'))

while the doc says only string values are allowed inside { arrays, so there is some inconsistency here.

The revised wording is much clearer, thanks!