/kudu-tsbs

Harnesses/scripts for running the TSBS time series benchmark on Kudu

Primary LanguagePython

== Repo setup

./setup.sh

== Starting daemons:

influxdb

./sw/influx/usr/bin/influxd 

victoriametrics

./sw/victoriametrics/victoria-metrics-prod  \
  -retentionPeriod 1000 \
  -search.disableCache \
  -search.maxConcurrentRequests=$[$(nproc) * 2] \
  -search.maxQueryDuration=60s

kudu

# Disable replication since we'll only run one tserver.
kudu master run -fs-wal-dir /data/m -default-num-replicas 1 &

# Only retain a few seconds of history to save disk space and IO.
# The other comparable systems don't allow time-travel or differential
# backups.
kudu tserver run -fs-wal-dir /data/ts --tablet_history_max_age_sec=5 &

# We use large batches on load, so allow the web server to accept them
kudu-tsdb -logtostderr -webserver_max_post_length_bytes=100000000

Clickhouse

Install ClickHouse from APT repos as described on appropriate documentation. Run it with the default configuration.

== Running benchmarks

Start the appropriate server per above, and then run the following with one of influx, kudu, clickhouse, or victoriametrics:

# Set the number of concurrent clients as you like. Twice the number of vCPUs
# seems to maximize throughput for Kudu but has some throughput collapse issues
# on VictoriaMetrics.
NUM_CLIENTS=$[$(nproc)*2]
./env/bin/python -u benchmark.py load influx
./env/bin/python -u benchmark.py run-queries influx --workers=$NUM_CLIENTS

Various results will be emitted into logs/

== Summarizing results

Simple tabulation can be done on the command line:

cat logs/*json | jq '[.workers, .workload, .system, .qps] | @tsv' -r | sort -n | column -t 

or use the included jupyter notebook graphs.ipynb to generate nice graphs.