SunDoge/simdjson-rust

missleading benchmarks

Opened this issue · 4 comments

Hi,

I wanted to suggest changing the benchmark output slightly, as they are presented, it is somewhat misleading.

The way serde_json and simd-json treat the Dom is very different from how simdjson treats the Dom. Both are valid tradeoffs to make, but comparing them is not very meaningful.

Both serde_json and simd-json when presenting a Dom create a nested data structure that is modifiable and has indexed maps - a data structure on its own. That comes at the cost of allocations and filling data structures, but it's a valid tradeoff when either map are accessed frequently, or the date needs to be modified.

simdjson presents a pointer to the tape as a Dom, which means it does not perform extra allocations but does not allow mutations, and lookups are always in linear time.

Again, both are valid tradeoffs for different use cases. However, comparing them is problematic as what we compare isn't the same result.

I think the best way would be to create a third category aside of Dom, Struct called Tape, which is the fully validated JSON but not put in a nested data structure. serde_json does not provide an interface like that, simd-json does provide to_tape which provides an equivalent data structure to simdjson but without the nicer access functions (so that should be easy to implement oneself or add).

Thanks for your reminder. I'll convert the dom to serde_json's Value and rebenchmark it.

@SunDoge I think both should be included in the benchmarks. Converting might not be needed for all the applications.

Ja both is definetly the best, and if not all libraries support all target formats isn't a big issue

FWIW simd-json has now DOM like read-only access to the tape so it would be possible to include the DOM versions in the benchmark as well