NOTE: This is a fork of original repo to keep notes while reading the book https://howqueryengineswork.com/00-introduction.html. Most notes should be under the Kotlin implementation in jvm.
This is the companion repository for the book How Query Engines Work and contains source code for a simple in-memory query engine implemented in Kotlin.
The query engine is designed to be easy to learn and hack on rather than being optimized for performance, scalability, or robustness.
The query engine contains the following components:
- DataFrame API
- SQL Parser
- SQL Query Planner
- Logical Plan
- Query Optimizer
- Physical Plan
- Server
- JDBC Driver
The following operators are supported:
- Table Scan (Parquet and CSV)
- Projection
- Filter
- Hash Aggregate
The following expressions are supported:
- Literals
- Attributes
- Simple Aggregates (Min, Max, Sum)
- Cast
- Boolean expressions (AND, OR, NOT)
- Simple math expressions (+, -, *, /)
The gradle build script uses the protobuf-gradle-plugin Gradle plugin to generate Java source code from the Ballista protobuf file and this depends on the protobuf compiler being installed.
Use the following instructions to install the protobuf compiler on Ubuntu or similar Linux platforms.
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.11.4/protobuf-all-3.11.4.tar.gz
tar xzf protobuf-all-3.11.4.tar.gz
cd protobuf-3.11.4/
./configure
make
sudo make install
sudo ldconfigcd jvm
./gradlew publishToMavenLocalSome of the examples in the book use the following data set.
wget https://nyc-tlc.s3.amazonaws.com/trip+data/yellow_tripdata_2019-12.csv