This code contains a sample implementation for Plan-Structured Deep Neural Network Models for Query Performance Prediction presented at VLDB 2019, and the code for training/testing on
-
TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with Postgres
The TPC-H data are generated with
./dbgen -s <scale factor>
.The TPC-H queries are generated using
./qgen <query_id> -r <seed number>
where varying seed random number is used to generate different queries from a template.Tables in PostgresSQL are created using dss.ddl and then have indexes created with tpch-postgres-index-ddl.sql.
The query plan structure and query performance metrics are retrieved as json objects by running the generated queries above with the
explain (analyze, format JSON, verbose)
statement prepended. -
TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with NoisePage
-
TPC-C queries and smallbank dataset generated using OLTP and benchmarked with NoisePage
- Linux or macOS
- Python 3
-
Cloning the repo:
git clone https://github.com/rabbit721/QPPNet.git cd QPPNet
-
Install the required python packages:
- For pip:
pip install -r requirements.txt
- For conda:
conda env create -f environment.yml
- For pip:
-
TPC-H benchmarked with PostgresSQL
# Under directory dataset/postgres_tpch_dataset wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch0p1g.zip && unzip psqltpch0p1g.zip wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch1g.zip && unzip psqltpch1g.zip wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch10g.zip && unzip psqltpch10g.zip
-
TPC-H benchmarked with NoisePage:
Data files already located under directory
dataset/terrier_tpch_dataset
asexecution_0p1G.csv
,execution_1G.csv
, andexecution_10G.csv
-
TPC-C and smallbank benchmarked with NoisePage:
Data files already located under directory
dataset/oltp_dataset
astpcc_pipeline.csv
andsb_pipeline.csv
-
On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with PostgresSQL
python3 main.py --dataset PSQLTPCH -s 0 -t 250000 --batch_size 128 -epoch_freq 1000 --lr 2e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/postgres_tpch_dataset/tpch1g/900-exp_res_by_temp/
-
On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with NoisePage
python3 main.py --dataset TerrierTPCH -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 1e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/terrier_dataset/execution_1G.csv
-
On TPC-C dataset generated using OLTP with SF=1 and benchmarked with NoisePage
python3 main.py --dataset OLTP -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 5e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/oltp_dataset/tpcc_pipeline.csv
-
Getting a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL:
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/psqltpch_epoch4000.zip
-
Getting a model trained for 20000 epochs on TPC-H SF=1 dataset benchmarked with NoisePage:
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/terriertpch_epoch20000.zip
-
Getting a model trained for 10000 epochs on TPC-C dataset generated with OLTP and benchmarked with NoisePage:
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/tpcc_epoch10000.zip
When testing, please make sure that trained models are saved in ./saved_model
and the mean and range values of the train data are provided with the flag '--mean_range_dict' in order to normalize the input.
The '-s' flag is interpreted as an integer and is used to specify the epoch number of the saved model to be tested.
-
Testing a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL on TPC-H SF=10 dataset benchmarked with PostgresSQL.
python3 main.py --test_time --dataset PSQLTPCH -s 4000 --mean_range_dict mean_range_dict.pickle --data_dir ./dataset/postgres_tpch_dataset/tpch10G/900-exp_res_by_temp/
-
Testing a model trained for 10000 epochs on the TPC-C benchmark on the smallbank dataset. Please make sure that models are saved in
./saved_model
python3 main.py --test_time --dataset OLTP -s 10000 --data_dir ./dataset/oltp_dataset/sb_pipeline.csv