This repository provides a basic implementation of relational algebra to enable exploration of different aspects of a relational query processing engine.
The implementation is distributed over the following files:
queryprocessor/relation.py
: implements the structure of relations in a database:- Attribute: Defines individual attributes with names and data types.
- Schema: Describes a set of attributes, akin to a table structure.
- Relation: Represents a table with methods to manage tuples, simulate storage types (memory, SSD, HDD), and create/search indexes.
queryprocessor/algebra.py
: represents a query tree consisting of relational algebra operators. This only represents the structure of the tree -- not the implementation of the individual operators.queryprocessor/operator.py
: implements several relational algebra operators to illustrate differentiation in performance and complexity. Contains aQueryPlan
object which implements a basic "execute" function to pull tuples out of the query plan.queryprocessor/benchmark.py
: emits a Pandas DataFrame containing benchmarking results for running a set of query plans on a set of relations
The operator.py
file implements the following relational algebra operators:
- Scan
ScanOp
: linear scan over relationIndexScanOp
: scans tuples which have a given value in an indexed column
- Select
SelectOp
: naive selection; uses a Python expression as the predicate so useyear == 1970
instead ofyear = 1970
OrderedSelectOp
: takes advantage of a sorted relation to skip rows on an equality selection
- Project
ProjectOp
: simple projection operator
- Join
JoinOp
: nested loop joinHashJoinOp
: hash join with build and probe phases
- Rename
RenameOp
: simple rename operator
- OrderByOp
OrderByOp
: simple ordering operator; materializes the full underlying relation to perform the sort
The following features are not included in this repository:
- SQL parser
- query planner