stephenberry/glaze

Allow empty NDJSON parsing

Closed this issue · 5 comments

I did not find anything specific in the NDJSON specifications, however I think that an empty string should be a valid NDJSON, i.e. an empty array. In particular I think it would be very convenient if the following roundtrip example would work with an empty vector:

int main([[maybe_unused]] const int argC, [[maybe_unused]] char* argV[]) {
	std::vector<int> foo;
	const auto       testJson = glz::write_ndjson(foo).value_or("Write error");
	std::cout << '"' << testJson << '"' << std::endl;
	const auto test = glz::read_ndjson<std::vector<int>>(testJson);
	if (!test) { std::cout << "Parse error" << std::endl; }
	else { std::cout << '"' << test.value().size() << '"' << std::endl; }
	return 0;
}

I like this idea, but I'm also cautious about breaking the specification.
The author of NDJSON is no longer supporting it and it has been superseded by JSON Lines, which is the specification that I would like the support in Glaze.

Note the second bullet:

  1. Each Line is a Valid JSON Value
    The most common values will be objects or arrays, but any JSON value is permitted.

Given that each line must be a valid JSON Value, this would exclude an empty buffer. Should we break this rule in this case? Let me know your thoughts and I'll think a little more on it before I implement this. Can you give an example of how this feature would help you?

Yes that makes sense, to use JSON Lines if NDJSON is discontinued.

I am not used to parse technical specifications, but I am not entirely sure whether this second bullet really excludes an empty string. One could argue that an empty string has no lines at all, therefore each line is valid JSON. This view is reinforced by the fact that the JSON Lines validator claims that a completely empty string is valid. Notably a string containing a single whitespace is shown as invalid (which makes sense as the string now contains a line). So I am not even sure if this would break the specification.

Assuming it does, I would indeed argue that it makes sense to break the specification:

  • I think for (ND)JSON in general and glaze in particular it makes sense to be able to (de-)serialize any array-like object. Excluding empty arrays seems like an unreasonable restriction, as those are common and legal (in c++).
  • I fail too see the downside of accepting empty strings as empty arrays. It is not ambiguous in any way.
  • Assuming empty strings are not legal, glaze is already breaking the specs by serializing empty arrays as empty strings. Consequently this would have to raise an error too.

My particular use case is that I have several processes communicating via files, exchanging mainly c++ vectors. The files are usually appended by one process, making NDJson ideal, opposed to a normal JSON array which cannot be trivially appended because of the closing bracket. In some cases the files stay empty for some time, currently resulting in deserialization errors when read by other processes. It is not hard to fix, by including an extra check for empty files, but I just think it should not be necessary.

I raised an issue in the Json lines github, to clarify whether empty strings are valid in the specs.

Thanks for your detailed response. After thinking about it more I totally agree with you.

I just merged in support (#1381) and added a simple unit test. Thanks for reporting this!