support different ways of ingesting data for tpc-h and tpc-ds

Question

support different ways of ingesting data for tpc-h and tpc-ds

Opened this issue 5 years ago · 4 comments

this would be very useful to e.g. ingest zstd compressed files as to speed up the ingest for small datasets (up to e.g. 100G)

Answer 1 · 2020-04-16T13:46:47.000Z

Can you please elaborate a bit?

Answer 2 · 2020-04-16T13:48:09.000Z

the majority of the ingest time is now taken up by actually generating the data. if you have enough space (e.g. big-bertha) then storing the input makes for a much quicker turn-around time for when you have to try a set of benchmarks.

Answer 3 · 2020-04-16T13:49:59.000Z

We could add a data-source flag or similar to these benchmarks. Thus, if the user has data files ready, it would work from there and fall back to the generator.

Answer 4 · 2020-04-16T13:50:27.000Z

that would be very cool :)