COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

UPDATE: Reproducing results and figures

Follow the detailed instructions from instructions.pdf.

Experimental Setup
Install MapD Database System
Create Database and Load Data
Download IMDB Dataset
Download Join Order Benchmark
Pre-build Sketch Templates
Run a Query
Questions
Acknowledgments
References
Citation

1. Experimental Setup

COMPASS was developed in the following environment:

Machine: 2x Intel(R) Xeon(R) CPU E5-2660 v4 and NVIDIA Tesla K80
OS: Ubuntu 16.04 LTS
CUDA v10, Clang/LLVM

2. Install MapD Database System

COMPASS is built as an extension on top of a clone of MapD System, version 3.6.1 and later it was rebranded to OmniSciDB. Find location where the project is cloned and follow the commands below:

unzip compass_query_optimizer-master.zip
cd compass_query_optimizer-master/mapd-core/
source scripts/mapd-deps-ubuntu.sh
cd ThirdParty/librdkafka/
source /usr/local/mapd-deps/mapd-deps.sh && cd ../../
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=debug ..
make clean & make -j 56

The number of threads is 56. COMPASS has hyper parameters such as sketch size in mapd-core/Catalog/COMPASS_init_variables.txt. See variables_details.txt for more details about COMPASS hyper parameters.

3. Create Database and Load Data

All databases created by the user are at .mapd/data/. The default user is mapd.

Run server: ./bin/mapd_server --data ~/.mapd/data/
Run client and create database called imdb:

./bin/mapdql mapd -u mapd -p HyperInteractive
CREATE DATABASE imdb (owner = 'mapd');
\q
Run client and create schemas: cat schematext.sql | ./bin/mapdql imdb -u mapd -p HyperInteractive
Stop server ctrl + c on keyboard

See dataset/README.md for more details about IMDB schemas.

4. Download IMDB Dataset

The dataset that was used is Internet Movie Data Base (IMDB). The data is publicly availabe in text files and open-source imdbpy package was used to transform txt files to CSV files in [1]. See for more details in here. The 3.6 DB snapshot is from May 2013 and it can be downloaded here. The dataset includes 21 CSV files i.e. 21 relations in total. It also includes queries to create the necessary relations written in schema.sql or schematext.sql files.

5. Download Join Order Benchmark

The workload that COMPASS's performance was evaluated based on Join Order Benchmark(JOB). JOB consists of 113 queries in total including 33 query families that each family queries differ only in selection predicates. See queries/README.md for more details about workload.

6. Pre-build Sketch Templates

In the case of relations with no selection predicates, building the sketches can be avoided by using pre-built sketch templates. This is an additional optimization in order to keep the overhead low. See sketch-templates/README.md for more details about building templates. We already uploaded the prebuilt sketches for sketch sizes 73 x 1021 on imdb workload. However one needs to re-create the templates in case of changing sketch sizes by running the queries in sketch-templates.

7. Run Query

Current implementation is only on GPU part of MapD system. However one can implement COMPASS on CPU mode. In order to run a query follow the instructions below:

Run server: ./bin/mapd_server --data ~/.mapd/data/
Run client: ./bin/mapdql imdb -u mapd -p HyperInteractive
Choose GPU mode: \gpu
Turn on COMPASS: \fpd
Paste query from a query file.
Stop client: \q
Stop server: ctrl + c on keyboard

8. Questions

If you have questions, please contact:

Yesdaulet Izenov [yizenov@ucmerced.edu], (https://yizenov.github.io/)
Asoke Datta [adatta2@ucmerced.edu], (https://asoke26.github.io/adatta2/)
Florin Rusu [frusu@ucmerced.edu], (https://faculty.ucmerced.edu/frusu/)

9. Acknowledgments

This work is supported by NSF award (number 2008815) and by the U.S. Department of Energy Early Career Award (DOE Career).

10. References

[1] Query optimization through the looking glass, and what we found running the Join Order Benchmark
[2] Filter Push-Down extension for MapD

11. Citation

@misc{compass-github,
  author = {Yesdaulet Izenov},
  title = "{The COMPASS Query Optimizer}",
  howpublished = "\url{https://github.com/yizenov/compass_query_optimizer}"
}

yizenov/compass_query_optimizer