horoscope is an optimizer inspector for DBMS.
-
Install
- Install using script(recommend)
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/chaos-mesh/horoscope/master/install.sh | sh
Then open a new terminal and try
horo -h
.NAME: horoscope - An optimizer inspector for DBMS USAGE: horo [global options] command [command options] [arguments...] COMMANDS: init, i initialize workload test test the optimizer gen, g Generate a dynamic bench scheme query, q Execute a query hint, H Explain hint of a query explain, e Explain analyze a query info Show database information index Add indexes for tables card test the cardinality estimations split, s Split data into several slices load Load data in a directory help, h Shows a list of commands or help for one command GLOBAL OPTIONS: --dsn DSN, -d DSN set DSN of target db (default: "root:@tcp(localhost:4000)/test?charset=utf8") --workload DIR, -w DIR workload DIR of horo (default: "workload") --json, -j format log with json formatter (default: false) --file FILE, -f FILE set FILE to store log --verbose LEVEL, -v LEVEL set LEVEL of log: trace|debug|info|warn|error|fatal|panic (default: "info") --max-open-conns numbers the max numbers of connections (default: 100) --max-idle-conns numbers the max numbers of idle connections (default: 20) --max-lifetime seconds the max seconds of connections lifetime (default: 10) --not-save do not save options (default: false) --help, -h show help (default: false)
- Build from source
git clone https://github.com/chaos-mesh/horoscope.git make
Then try
bin/horo -h
. -
Initialize Workload
Enter a clean directory and execute:
horo init
You may fail because there is no tidb or mysql server listening on
localhost:4000
. A custom data source name or workload directory is also supported:horo -d "root@tcp(172.20.1.1)/test" -w . init
All options will be saved in
horo.json
of current directory:{ "main": { "workload": ".", "dsn": "root@tcp(172.20.1.1:4000)/test", "json_formatter": false, "log_file": "", "verbose": "trace", "pool": { "max_open_conns": 100, "max_idle_conns": 20, "max_life_seconds": 10 } }, "bench": { "round": 1, "need_prepare": false, "disable_collect_card_error": false, "no_verify": false, "report_fmt": "table" }, "card": { "columns": "", "type": "emq", "timeout": 0 }, "query": { "plan_id": 0 }, "generate": { "queries": 20, "and_op_weight": 3, "mode": "op-compose", "generator": { "max_tables": 3, "min_duration_threshold": 10000000, "limit": 100, "key_only": false, "unstable_order_by": false, "max_by_items": 3, "enable_key_map": false, "aggregate_weight": 0.5 } }, "index": { "max_indexes": 10, "compound_level": 1, "reserve_indexes": false }, "info": { "table": "" }, "load": { "data_source": "" }, "split": { "group": "", "slices": 100, "batch_size": 100, "use_bit_array": false } }
-
Generate Queries
horo gen -c 3
-
Start Benching
horo test -p
horo -w benchmark/tpch test -p -r 4
For example, measures the EMQ(exact match queries) row cnt error on customer.C_NAME
for total 100 seconds.
horo card -columns 'customer.C_NAME' -type emq -timeout 100s
There will generate a summary report after bench
sub-command is finished.
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | #PLAN SPACE | DEFAULT EXECUTION TIME | BEST PLAN EXECUTION TIME | EFFECTIVENESS | BETTER OPTIMAL PLANS | ESTROW Q-ERROR | QUERY |
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| q3 | 11 | 12061.0ms ±11% | 5401.8ms ±19% | 72.7% | #6(44.8%),#10(66.4%),#11(47.3%) | count:3, median:1.0, 90th:7173270.0, 95th:7173270.0, max:7173270.0 | SELECT l_orderkey,sum(l_extendedprice*(1-l_discount)) AS revenue,o_orderdate,o_shippriority FROM ((customer) JOIN orders) JOIN lineitem WHERE c_mktsegment="AUTOMOBILE" AND c_custkey=o_custkey AND l_orderkey=o_orderkey AND o_orderdate<"1995-03-13" AND l_shipdate>"1995-03-13" GROUP BY l_orderkey,o_orderdate,o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10 |
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
ID
: query id#PLAN SPACE
: the plan space size of a queryDEFAULT EXECUTION TIME
: the execution time of default plan, giving in the format of "Mean ±Diff", "Mean" is the mean value ofround
rounds, and "Diff" is the lower/upper bound of the mean valueBEST PLAN EXECUTION TIME
: the execution time of the best planEFFECTIVENESS
: the percent of the execution time of the default plan better than others on plan space- We use Pd to represent the default plan generated for the query, Pi as one of plan on plan space
- If execution time(Pi) < 0.9 * execution time(Pd), Pi is a better plan
BETTER OPTIMAL PLANS
: gives the better plan, each item is giving in the format of "nth_plan id(execution time / default execution time)"ESTROW Q-ERROR
: Base table row cnt estimation q-error for each queryQUERY
: the query
We integrate the SQL queries of TPCH, TPCDS, SSB, and JOB benchmarks on the repo, you can use go-tpc and tidb-bench to import the dataset.
For the JOB benchmark, join-order-benchmark is helpful.
Refer to index selection fuzz