WeTune is a rule generator that can automatically discover new query rewrite rules for SQL query optimization.
Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, and Jinyang Li. 2022. WeTune: Automatic Discovery and Verification of Query Rewrite Rules. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22).
This codebase includes the source code and the testing scripts in the paper Automatic Discovery and Verification of Query Rewrite Rules
.
|-- click-to-run # Click-to-run scripts for WeTune.
|-- lib # Required external library.
|-- common # Common utilities.
|-- sql # Data structures of SQL AST and query plan.
|-- stmt # Manager of queries from open-source applications.
|-- superopt # Core algorithm of WeTune.
|-- fragment # Plan template enumeration.
|-- constraint # Constraint enumeration.
|-- uexpr # U-expression.
|-- logic # SMT-based verifier.
|-- optimizer # Rewriter.
|-- testbed # Evaluation framework.
|-- spes/ # Integrated SPES-based verifier.
|-- wtune_data/ # Data input/output directory.
|-- schemas/ # Schemas of applications.
|-- schemas_mssql/ # Schemas of applications for SQL Server
|-- wtune.db # Sqlite DB storing the persistent statistics
- Java 17
- Gradle 7.3.3
- z3 4.8.9 (SMT solver)
- antlr 4.8 (Generate tokenizer and parser for SQL AST)
- Microsoft SQL Server 2019 (Evaluate the usefulness of rules)
z3 and antlr library have been put in lib/
off-the-shelf.
You can follow the steps of the script below to set up your environment.
click-to-run/environment-setup.sh
gradle compileJava
This section gives the instruction of the whole workflow of WeTune, including
- rule discovery
- rewriting queries using rules
- pick useful rules by evaluating the rewritings.
The whole procedure typically takes several days (mainly for rule discovery).
If you are particularly interested in how WeTune works, please refer to the section Run Example, which gives instructions of running a few individual examples of the enumeration and equivalence proof. The detailed output will help understand the internal of WeTune.
# Launch background processes to run rule discovery
click-to-run/discover-rules.sh
# After the all processes finished:
click-to-run/collect-rules.sh && click-to-run/reduce-rules.sh
# Check progress:
click-to-run/show-progress.sh
# Use this to terminate all process
click-to-run/stop-discovery.sh
The first commands launches many processes running in the background.
The procedure will consume all CPUs and take a long time (~3600 CPU hours) to finish. The discovered rules so far can be
found in wtune_data/enumeration/run_*/success.txt
(*
here is a timestamp).
The second commands aggregates wtune_data/enumeration/run_*/succcess.txt
and reduce the rules (Section 7 in paper).
The resulting rules can be found in wtune_data/rules/rules.txt
.
The third are used to check the progress. The fourth can terminate all background tasks launched by the first command.
Why multi-process instead of multi-thread?
z3 incurs high inter-thread lock contention. The script uses multi-process instead of multi-thread to mitigate this problem.
Since the rule discovery takes a substantial time, we have provided enumerated rules in
wtune_data/prepared/rules.txt
click-to-run/rewrite-queries.sh [-R <path/to/rules>]
This script uses the rules in <path/to/rules>
to rewrite 8000+ web application queries.
-R
: path to rule file, rooted bywtune_data/
. Default:wtune_data/rules/rules.txt
if exists, otherwisewtune_data/prepared/rules.txt
.
The rewritten queries can be found in wtune_data/rewrite/result/1_query.tsv
.
# Estimate the cost of rewritten queries and pick one with the minimal cost
# The prepared `base` workload is used to estimate the queries' cost
click-to-run/prepare-workload.sh -tag base
click-to-run/estimate-cost.sh
# Profile the performance of rewritten queries
# If you pofile on the `base` workload type, no need to run `prepare-workload.sh` again
click-to-run/prepare-workload.sh [-tag <workload_type>]
click-to-run/profile-cost.sh [-tag <workload_type>]
These scripts pick the optimized queries and profile them using Sql Server database.
For database connection parameters in the scripts above and in file
common/src/main/java/wtune/common/datasource/DbSupport.java
, you can change them according to your evaluation environment.**
Use click-to-run/prepare-workload.sh
to prepare profiling workload data in Sql Server database.
It creates databases and corresponding schemas in Sql Server, then generate and import data to Sql Server.
Dumped data files can be found in directory wtune_data/dump/
.
click-to-run/estimate-cost.sh
takes previously generated file wtune_data/rewrite/result/1_query.tsv
as input and
pick one rewritten query with the minimal cost by asking the database's cost model. The result will be stored
in wtune_data/rewrite/result/2_query.tsv
and used rules will be stored in
wtune_data/rewrite/result/2_trace.tsv
.
And click-to-run/profile-cost.sh
evaluates the optimized queries. The output file is in wtune_data/profile/result/
by
default.
-tag
: specifies workload type. Default type isbase
. See details below.
In the paper, we evaluate query performance on 4 different workload types:
Workload type | # of rows | Data distribution |
---|---|---|
base | 10 k | uniform |
zipf | 10 k | zipfian |
large | 1 M | uniform |
large_zipf | 1 M | zipfian |
If you would like to evaluate on different type of workload, you
can set -tag
option to the scripts.
For example, to evaluate queries on workload type of zipf
, run:
click-to-run/prepare-workload.sh -tag zipf
click-to-run/profile-cost.sh -tag zipf
The profiling result is actually stored in file wtune_data/profile/result/{workload_type}
by default.
Finally, you can run:
click-to-run/view-all.sh
to view rewriting and profiling results together.
The resulting files are stored in wtune_data/viewall/result/
by default.
|-- viewall
|-- result
|-- rules.tsv # Used rules during rewriting
|-- statistic.tsv # Rewritten queries and profiling results
WeTune has integrated an existing verifier called SPES (Zhou et al. SPES: A Two-Stage Query Equivalence Verifier. VLDB '20.) to improve the ability of discovering rules. The workflow of using SPES to discover rules, rewrite and evaluate queries is similar to WeTune Workflow section.
# Launch background processes to run rule discovery
click-to-run/discover-rules-continuous.sh -spes
# After the all processes finished:
click-to-run/collect-rules.sh && click-to-run/reduce-rules.sh -spes
# Check progress:
click-to-run/show-progress.sh
# Use this to terminate all process
click-to-run/stop-discovery.sh
The first commands use SPES verifier to prove rule correctness. We use
click-to-run/discover-rules-continuous.sh -spes
instead when using SPES verifier
since sometimes the processes may be unexpectedly shut down in SPES's
current implementation, which remains to be fixed in the future work.
The discovered rules so far can also be found in
wtune_data/enumeration/run_*/success.txt
(*
here is a timestamp).
The second commands aggregates wtune_data/enumeration/run_*/succcess.txt
and reduce the rules (Section 7 in paper).
The resulting rules can be found in wtune_data/rules/rules.spes.txt
.
The third are used to check the progress. The fourth can terminate all background tasks launched by the first command.
For the simplicity of demonstration, we separate the enumeration using the built-in and SPES verifier.
We have also provided rules enumerated by SPES verifier in
wtune_data/prepared/rules.spes.txt
click-to-run/rewrite-queries.sh -spes [-R <path/to/rules>]
This script uses the rules in <path/to/rules>
to rewrite queries.
-R
: path to rule file, rooted bywtune_data/
. Default:wtune_data/rules/rules.spes.txt
if exists, otherwisewtune_data/prepared/rules.spes.txt
.
The rewritten queries can be found in wtune_data/rewrite/result_spes/1_query.tsv
.
# Estimate the cost of rewritten queries and pick one with the minimal cost
# The prepared `base` workload is used to estimate the queries' cost
click-to-run/prepare-workload.sh -tag base
click-to-run/estimate-cost.sh -spes
# Profile the performance of rewritten queries
# If you pofile on the `base` workload type, no need to run `prepare-workload.sh` again
click-to-run/prepare-workload.sh [-tag <workload_type>]
click-to-run/profile-cost.sh -spes [-tag <workload_type>]
These scripts pick the optimized queries and profile them using Sql Server database.
Use click-to-run/prepare-workload.sh
to prepare workload data in Sql Server database.
click-to-run/estimate-cost.sh -spes
takes previously generated file wtune_data/rewrite/result_spes/1_query.tsv
as input and
pick one rewritten query with the minimal cost by asking the database's cost model. The result will be stored
in wtune_data/rewrite/result_spes/2_query.tsv
and used rules will be stored in
wtune_data/rewrite/result_spes/2_trace.tsv
.
click-to-run/profile-cost.sh -spes
evaluates the optimized queries. The output file is in wtune_data/profile/result_spes/
by
default.
Finally, you can run:
click-to-run/view-all.sh -spes
to view rewriting and profiling results together.
Corresponding output files are in wtune_data/viewall/result_spes/
.
This section provides the instruction of run examples:
- template enumeration
- rule verification
- constraint enumeration
click-to-run/run-template-example.sh
All templates of max size 4 (after pruning) will be printed, 3113 in total.
click-to-run/run-verify-example.sh [rule_index]
<rule_index>
can be 1-35, corresponds to Table 7 in the paper.
For each rule, the following items will be printed:
- The rule itself.
- A query q0, on which the rule can be applied.
- A query q1, the rewrite result after applying the rule to q0.
For rule 1-31, which can be proved by WeTune built-in verifier, these additional items will be printed:
- A pair of U-Expression, translated from the rule.
- One or more Z3 snippets, the formula submitted to Z3.
Why there can be more than one snippet?
To relief the burden of Z3, When we are going to prove
(p0 /\ p1 ... /\ pn) /\ (q \/ r)
is UNSAT, we separately prove that(p0 /\ p1 ... /\ pn) /\ q
and(p0 /\ p1 /\ ... /\ pn)
are both UNSAT. This is particularly the case when applying theorem 5.2.
click-to-run/run-enum-example.sh [-dump] <rule_index>
<rule_index>
can be 1-35, corresponds to Table 7 in the paper.
-dump
specifies whether to dump all searched constraint sets to output.
WeTune will enumerate the constraints between the plan template of given rule, and search for the most-relaxed constraint sets. Each of the examined constraint set and its verification result will be printed. The found rules and metrics will be appended after enumeration completes.
P.S. If -dump
is specified, for some pairs, the output floods for a few minutes, you may want to dump it to a file.
The Calcite queries can be found in wtune_data/calcite/calcite_tests
.
We can use discovered rules to rewrite and profile Calcite queries as well,
simply by adding an optional parameter -calcite
to related scripts in the workflow:
click-to-run/rewrite-queries.sh -calcite [-R <path/to/rules>]
The rewritten queries can be found in wtune_data/calcite/result/
.
# Estimate the cost of rewritten queries and pick one with the minimal cost
# The prepared `base` workload is used to estimate the queries' cost
click-to-run/prepare-workload.sh -calcite -tag base
click-to-run/estimate-cost.sh -calcite
# Profile the performance of rewritten queries
# If you pofile on the `base` workload type, no need to run `prepare-workload.sh` again
click-to-run/prepare-workload.sh -calcite [-tag <workload_type>]
click-to-run/profile-cost.sh -calcite [-tag <workload_type>]
Use click-to-run/prepare-workload.sh -calcite
to prepare workload data in Sql Server database.
click-to-run/estimate-cost.sh -calcite
takes previously generated file wtune_data/calcite/result/1_query.tsv
as input and
pick one rewritten query with the minimal cost by asking the database's cost model. The result will be stored
in wtune_data/calcite/result/2_query.tsv
and used rules will be stored in
wtune_data/calcite/result/2_trace.tsv
.
click-to-run/profile-cost.sh -calcite
evaluates the optimized queries. The output file is in wtune_data/profile_calcite/result/
by
default.
Finally, you can run:
click-to-run/view-all.sh -calcite
to view rewriting and profiling results together.
Corresponding output files are in wtune_data/viewall_calcite/result/
.
click-to-run/verify-calcite-query.sh
This script verify the equivalence of Calcite query pairs by directly transforming the entire query into rule. The ordinal (line number) of 35 verifiable pairs will be printed, together with the rule.
click-to-run/verify-calcite-transformation.sh
This script verify the equivalence of Calcite query pairs by directly both of two queries and checking whether the rewritten queries coincide, which effectively indicates WeTune can verify the transformation between the two query. The ordinal (line number) of 73 verifiable pairs will be printed, together with the rewritten query.