Python-based Distributed Database
Sidewinder is a Python-based (with asyncio) Proof-of-Concept Distributed Database that distributes shards of data from the server to a number of workers to "divide and conquer" OLAP database workloads.
It consists of a server, workers, and a client (where you can run interactive SQL commands).
Sidewinder will NOT distribute queries which do not contain aggregates - it will run those on the server side.
Sidewinder uses Apache Arrow with Websockets with TLS for secure communication between the server, worker(s), and client(s).
It uses DuckDB as its SQL execution engine - and the PostgreSQL parser to understand how to combine results from distributed workers.
You can install sidewinder-db
from PyPi or from source.
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
pip install sidewinder-db
git clone https://github.com/prmoore77/sidewinder
cd sidewinder
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel
# Install Sidewinder-DB - in editable mode with dev dependencies
pip install --editable .[dev]
For the following commands - if you running from source and using --editable
mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:
export PYTHONPATH=$(pwd)/src
Bootstrap the environment by creating a security user list (password file), TLS certificate keypair, and a sample TPC-H dataset with 11 shards
. .venv/bin/activate
sidewinder-bootstrap \
--client-username=scott \
--client-password=tiger \
--worker-password=united \
--tpch-scale-factor=1 \
--shard-count=11
Run sidewinder locally - from root of repo (use --help option on the executables below for option details)
. .venv/bin/activate
sidewinder-server
Open another terminal, then start a single worker (using the same worker password you used in the bootstrap command above) with command:
. .venv/bin/activate
sidewinder-worker --tls-roots=tls/server.crt --password=united
Note: you can run up to 11 workers for this example configuration, to do that do this instead of starting a single-worker:
. .venv/bin/activate
for x in {1..11}:
do
sidewinder-worker --tls-roots=tls/server.crt --password=united &
done
To kill the workers later - run:
kill $(jobs -p)
Open another terminal, then connect with the client - using the same client username/password you used in the bootstrap command above:
. .venv/bin/activate
sidewinder-client --tls-roots=tls/server.crt --username=scott --password=tiger
Then - while in the client - you can run a sample query that will distribute to the worker(s) (if you have at least one running) - example:
SELECT COUNT(*) FROM lineitem;
Note: if you are running less than 11 workers - your answer will only reflect n/11 of the data (where n is the worker count). We will add delta processing at a later point...
SELECT * FROM region;
SELECT * FROM lineitem LIMIT 5;
Note: there are TPC-H queries in the tpc-h_queries folder you can run...
.set distributed = false;
To turn summarization mode OFF in the client (so that sidewinder does NOT summarize the workers' results - this only applies to distributed mode):
.set summarize = false;
Install DuckDB CLI version 1.1.0 - and make sure the executable is on your PATH.
Platform Downloads:
Linux x86-64
Linux arm64 (aarch64)
MacOS Universal
bumpver update --patch