/HorsePower

Optimizing database queries with array programming

Primary LanguageC++

HorsePower

HorsePower is designed for optimizing database queries with modern hardware. At its core is HorseIR, which is a well-designed array-based intermediate representation (IR) for database queries. Based on HorseIR, sophisticated compiler optimizations can be applied for database operations. Moreover, using array programming offers a promising option for performance speedup with fine-grained parallelism.

Project Overview

Figure 1. The workflow of the HorsePower framework.

In summer 2017, we started this project from scratch. The workflow of the HorsePower framework can be found in Figure 1. A candidate of the source language is our HorseIR language which is an extension of standard SQL. The Horse language is designed for data analytics with extended SQL features. At the current stage, we adopt execution plans from standard database SQL queries and MATLAB code. We provide a front end for parsing and transforming source code to HorseIR. After the optimization phases, multiple back-ends are supported. Static analyses and code optimizations are performed before the target code is generated. On the other hand, we provide an interpreter which allows running programs directly.

In HorsePower, we focus on the following parts.

- Design and implementation of array-based intermediate representation (IR)
- Static analysis for an array-based IR (i.e. HorseIR)
- Query optimizations with compiler optimizations
- Fine-grained primitive functions and highly tuned libraries

Installation

Download the repository

git clone git@github.com:Sable/HorsePower.git

Setup environment variables

cd HorsePower && source ./setup_env.sh

Setup Library

Installation with the following command line (About 13 mins)

(cd ${HORSE_LIB_FOLDER} && sh deploy_linux.sh)

After installation, new folders created as follows.

- include
- lib
- pcre2

Note, it is recommended to use gcc 8.1.0 or higher and additional library uuid-dev may be required during the installation.

Setup Data

Default data path for TPC-H

${HORSE_BASE}/data/tpch

In order to generate different scale factor datasets, you should run

cd data/tpch
./run.sh deploy       ## Read instructions and update Makefile
./run.sh gendb 1      ## Generate database and save to data/tpch/db1

With a specific scale factor, for example, 1, its path is

${HORSE_BASE}/data/tpch/db1

It contains a tbl file for each table

${HORSE_BASE}/data/tpch/db1/*.tbl

Build and Run

You are recommended to use the latest version as this project is still under active development.

To learn how to run, type

(cd ${HORSE_SRC_CODE} && ./run.sh)      # show usage

A Brief Summary

Name Notes
Platform Cross-platform
Tools C/C++, Flex & Bison
Parallelism OpenMP/Pthread/CUDA/OpenCL
Conventions docs/conventions

Quick Entries

IR design

Database TPC-H

Implementation

Publications

Copyright and License

Copyright © 2017-2020, Hanfeng Chen, Laurie Hendren and McGill University.