Data format normalization

Question

Data format normalization

coinex opened this issue 8 years ago · 1 comments

It might be useful to "normalize" results by moving objects into maps and making copies of input strings, for libraries that destroy inputs & use lists for maps. Just to see how much of the penalty is a result of these shortcuts. Conformance testing helps a lot here... you can see some very fast libraries that sacrifice conformance. However, internal data format and overall tl "friendliness" is another area to compare.

For example:

picojson does not destroy its input, and expends effort storing "objects" as std::map types, rather than list/vector of pairs
gason destroys its input - and also stores objects as lists of pairs.

But the parsers are pretty similar otherwise...and should perform about the same.

Answer 1 · 2016-08-31T05:23:11.000Z

This is quite difficult to "normalize" these differences. Some parsers get better performance by specially designed containers. Also, copying a data structure to another adds a lot of overheads.

It may be useful to add some benchmarks on DOM accessing speed, so that map/unordered_map implementation should get better results.

In your stated example will also reflect in some benchmarks, for example, destorying input (insitu parsing) will normally have higher memory footprint (as the other parts of non-string values in JSON cannot be released).

Actually, some "normalization" has been done already. For example some parsers lazily skip the number parsing part. To be fair, we should assume every values are parsed. So an additional conversion pass is added to those parser tests.