A tpc-h bench for ClickHouse.
Code modified from https://github.com/pingcap/tiup.
git clone git@github.com:jiyfhust/ClickHouse-tpch-bench.git
cd ClickHouse-tpch-bench
make
./tiup-bench tpch -D bench -H ck_host -P 9004 -U jiyf -p abc --sf=1 prepare
./tiup-bench tpch -D bench -H ck_host -P 9004 -U jiyf -p abc --sf=1 run
./tiup-bench tpch -D bench -H ck_host -P 9004 -U jiyf -p abc --sf=1 cleanup
9004 is mysql port of clickhouse
在 tiup tpc-h 上进行了一些修改:
- 修改了建表 SQL 以兼容 ClickHouse
- 修改了部分语句参数,例如 sql 中 date_add 函数 ClickHouse 不兼,将 date_add('1995-01-01', interval '3' month) 改为 ’1995-04-01‘
- 修改了不兼容的类型计算,例如数值 0.06 - 0.01 改为 toDecimal64(0.06 - 0.01, 2) 以兼容类型比较运算
- 对部分 sql 语句进行了更改,ClickHouse 对于 tpc-h 上很多 sql 语法上不支持,更改过程中有的改变了语义,后面会专门列出。
- ClickHouse 对于复杂查询性能较差,q19 没有跑出来结果,所以执行完 q18 就可以退出了,q20、q21、q22 没有对语句进行兼容性改写。
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier,
nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
改写后语义有改变
/*PLACEHOLDER*/ select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and ps_supplycost = ( select min(ps_supplycost) from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' ) order by s_acctbal desc, n_name, s_name, p_partkey limt 100;
/*PLACEHOLDER*/ select o_orderpriority, count(*) as order_count from orders where o_orderdate >= '1995-01-01' and o_orderdate < '1995-04-01' and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
改为后,语义没有改变。
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= '1995-01-01' and o_orderdate < '1995-04-01' and o_orderkey in ( select o_orderkey from lineitem,orders where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
/*PLACEHOLDER*/ select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand =
'Brand#44' and p_container = 'WRAP PKG' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
改为了,语义变了
select sum(l_extendedprice) / toDecimal64(7.0, 2) as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand =
'Brand#44' and p_container = 'WRAP PKG' and toFloat64(l_quantity) < ( select 0.2 * avg(l_quantity) from lineitem,part where l_partkey
= p_partkey and p_brand = 'Brand#44' and p_container = 'WRAP PKG' );