support different ways of ingesting data for tpc-h and tpc-ds
Opened this issue · 4 comments
lucvlaming commented
this would be very useful to e.g. ingest zstd compressed files as to speed up the ingest for small datasets (up to e.g. 100G)
sdressler commented
Can you please elaborate a bit?
lucvlaming commented
the majority of the ingest time is now taken up by actually generating the data. if you have enough space (e.g. big-bertha) then storing the input makes for a much quicker turn-around time for when you have to try a set of benchmarks.
sdressler commented
We could add a data-source flag or similar to these benchmarks. Thus, if the user has data files ready, it would work from there and fall back to the generator.
lucvlaming commented
that would be very cool :)