Require: git clone git@github.com:lovasoa/TPCH-sqlite.git
- Generate
TPC-H.db
./tpch gen-tpch.sh [path to the TPCH-sqlite repo] [SCALE] # this generate data under tpch directory
- Run
gen_dist_tpch.py
to distribute data inTPC-H.db
To be done:
- integrate sqlite, pandas, and numpy for easy data generation to multiple node settings
- distribute data based on tuple count distributions:
equal
,left
,right
,random
- partition data into nodes based on table and columns (can use consisten hashing)