/table_configuration_optimizer

A linear programming approach to jointly optimize the sorting, compression, indexing, and tiering decision for spatio-temporal workloads

Primary LanguageJupyter Notebook

Table Configuration Optimizer

A linear programming approach to jointly optimize the sorting, compression, indexing, and tiering decision for spatio-temporal workloads.

Setup

To get all dependencies of the Table Configuration Optimizer installed, run

pip install -r requirements.txt

To start the jupyter lab, run

jupyter lab

Additionally, it is required to have a solver, such as gurobi, installed.

Example: Workload

We used a workload consisting of six query templates to evaluate the models based on the data of a transportation network company.

ID Query Frequency Skipped Chunks
q0 SELECT * FROM TABLE WHERE
("driver_id" <= {selectivity of the value: 0.0001}) AND
("status" <= {0.7})
20% 0, 3, 6, 7, 8
q1 SELECT * FROM TABLE WHERE
("timestamp" BETWEEN {0.2}) AND
("latitude" BETWEEN {0.5}) AND
("longitude" BETWEEN {0.5}) AND
("status" <= {0.7})
15%
q2 SELECT * FROM TABLE WHERE
(„driver_id" <= {0.01}) AND
("latitude" BETWEEN {0.1}) AND
("longitude" BETWEEN {0.1})
10% 4, 5, 6, 7, 8, 9
q3 SELECT * FROM TABLE WHERE
("timestamp" <= {0.05}) AND
("latitude" BETWEEN {0.7}) AND
("longitude" BETWEEN {0.7})
25% 0, 1, 2, 6, 7, 8, 9
q4 SELECT * FROM TABLE WHERE
(„driver_id" <= {0.01}) AND
("timestamp" <= {0.4})
15% 1, 2, 3, 4, 5, 6, 7, 8, 9
q5 SELECT * FROM TABLE WHERE
("latitude" BETWEEN {0.01}) AND
("longitude" BETWEEN {0.01}) AND
("timestamp" BETWEEN {0.5})
15% 0, 1, 8, 9

Further workload examples are provided in /data/workloads.