swarm64/s64da-benchmark-toolkit

support different ways of ingesting data for tpc-h and tpc-ds

Opened this issue · 4 comments

this would be very useful to e.g. ingest zstd compressed files as to speed up the ingest for small datasets (up to e.g. 100G)

Can you please elaborate a bit?

the majority of the ingest time is now taken up by actually generating the data. if you have enough space (e.g. big-bertha) then storing the input makes for a much quicker turn-around time for when you have to try a set of benchmarks.

We could add a data-source flag or similar to these benchmarks. Thus, if the user has data files ready, it would work from there and fall back to the generator.

that would be very cool :)