/VPGAE

Source code of “Learning Vertical Partitions with Graph AutoEncoder”.

Primary LanguagePython

Learning Vertical Partitions with Graph AutoEncoder

Introduction

This is a PyTorch implementation of our proposed vertical partitioning algorithm: VPGAE and VPGAE-B, as described in our paper:

VPGAE: Learning Vertical Partitions with Graph AutoEncoder

Requirements

  • PyTorch (>=1.4.0)
  • PyTorch Geometric
  • numpy
  • matplotlib
  • sklearn
  • more_itertools
  • scipy
  • tqdm
  • pyvis
  • psycopg2
  • copy

Run our experiments

1. Preliminary Experiment

In our paper, we conduct a preliminary experiment to verify the reliability of our cost model. PostgreSQL 11.2 database is utilized in our experiment as the realistic system. First, you need to run generate_wide_table.py file to define a wide table and insert data into it. Then, just run vertical_partition_real_system.py file to see the results. Note that, before running our code, you need to modify the pg config options (such as password and host-IP) in the above two files.

2. RQ2, RQ3 and HAP

We have implemented TPC-H and TPC-DS benchmark (RQ2), random dataset (RQ3), and HAP benchmark (example in the Introduction section) experiments in vertical_partition.py file. The arg "dataset" is available to choose a dataset, so you can directly run this file:

python vertical_partition.py --dataset dataset_name

dataset_name available options: TPC_H, TPC_DS, random_dataset, HAP. By default, this file will run the TPC-H benchmark experiment.

3. RQ4

We also conduct an experiment to verify the performance of our dynamic strategy. Feel free to run vertical_partition_dynamic_workload.py file and wait patiently to see the results.

Implemented methods

  • VPGAE: Our proposed vertical partitioning method using graph model.
  • VPGAE-B: A refined version of VPGAE. We apply a beam search algorithm on the basis of VPGAE to get better solutions.
  • HILLCLIMB: A bottom-up greedy search algorithm. It starts from COLUMN layout and in each iteration, it merges two partitions that provide the highest improvement.
  • OPTIMAL: An exhaustive algorithm that checks all possible partitioning schemes.
  • COLUMN: A special layout that treats one attribute as one partition.
  • ROW: A special layout that treats all attributes as one partition.

Note that, we did not implement NAVATHE, HYRISE and O2P in our project, because these three algorithms have been implemented in Vertical partitioning algorithms used in physical design of databases and we used the partition results generated by their project for comparison.