kubecon-etcd-metrics-lab

Prerequisites

Note: image is only for linux/amd64 and linux/arm64. Tested on mac and linux.

Start 3-node etcd cluster with grafana and prometheus:

docker-compose -f docker-compose-etcd.yml -f docker-compose-metrics.yml up --force-recreate -V

To verify:

docker ps --all

Output:

~ docker ps --all
CONTAINER ID  IMAGE                                  COMMAND               CREATED         STATUS         PORTS                                              NAMES
8c2060085521  docker.io/bkanivets/etcd:v3.5.9  --name=etcd-1 --i...  26 seconds ago  Up 21 seconds  0.0.0.0:2379->2379/tcp, 0.0.0.0:11180->11180/tcp   kubecon-etcd-metrics-lab_etcd-1_1
e89e0d64213e  docker.io/bkanivets/etcd:v3.5.9  --name=etcd-2 --i...  25 seconds ago  Up 20 seconds  0.0.0.0:21180->11180/tcp, 0.0.0.0:22379->2379/tcp  kubecon-etcd-metrics-lab_etcd-2_1
163a3b97507f  docker.io/bkanivets/etcd:v3.5.9  --name=etcd-3 --i...  24 seconds ago  Up 19 seconds  0.0.0.0:31180->11180/tcp, 0.0.0.0:32379->2379/tcp  kubecon-etcd-metrics-lab_etcd-3_1
81db7802ceba  docker.io/prom/prometheus:latest       --config.file=/et...  23 seconds ago  Up 18 seconds  0.0.0.0:9090->9090/tcp                             kubecon-etcd-metrics-lab_prometheus_1
f04a2e245717  docker.io/grafana/grafana:latest                             21 seconds ago  Up 17 seconds  0.0.0.0:3000->3000/tcp                             kubecon-etcd-metrics-lab_grafana_1

Try running benchmark:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --key-space-size=100000 --total=10000

Check out default etcd dashboard.

Grafana credentials are: admin foorbar

Scenario 0: Puts and Ranges

Explore sequence diagram for Puts. Run put benchmark with 1000 clients:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=1000 --conns=1000 put --sequential-keys --key-space-size=100000 --total=100000000

Observe behavior at Puts dashboard.

Explore sequence diagram for Ranges.

Run range benchmark with 100 clients (while prior benchmark is still running):

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 range / --total=100000

Observe behavior at Ranges dashboard.

Scenario 1: delay fsync

Run put benchmark with 1000 clients (if prior benchmark stopped):

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=1000 --conns=1000 put --sequential-keys --key-space-size=100000 --total=1000000

Add small delay

curl http://127.0.0.1:11180/walBeforeFdatasync -XPUT -d'sleep(100)'
curl http://127.0.0.1:21180/walBeforeFdatasync -XPUT -d'sleep(100)'
curl http://127.0.0.1:31180/walBeforeFdatasync -XPUT -d'sleep(100)'

Observe behavior at Puts dashboard.

Add large delay

curl http://127.0.0.1:11180/walBeforeFdatasync -XPUT -d'sleep(1000)'
curl http://127.0.0.1:21180/walBeforeFdatasync -XPUT -d'sleep(1000)'
curl http://127.0.0.1:31180/walBeforeFdatasync -XPUT -d'sleep(1000)'

Run range benchmark with 100 clients (while 'put' benchmark is still running):

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 range / --total=100000

Observe behavior at Ranges dashboard.

Try range benchmark with --consistency=s:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 --consistency=s range / --total=100000

Scenario 2: delay network

Stop cluster after scenario 1.

Check out -rx-delay in docker-compose-etcd-bridge.yml. It should be set to 1000ms.

Start cluster with bridge interface:

docker-compose -f docker-compose-etcd-bridge.yml -f docker-compose-metrics.yml up --force-recreate -V

Check Peers dashboard.

Run put benchmark with 1000 clients:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=1000 --conns=1000 put --sequential-keys --key-space-size=100000 --total=100000

Observe behavior at Puts dashboard.

Scenario 3: reaching DB size limit

Stop previous cluster and start new:

docker-compose -f docker-compose-etcd.yml -f docker-compose-metrics.yml up --force-recreate -V

Run put benchmark with increased val size:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --val-size=10000 --key-space-size=100000 --total=10000000

Default limit it 2Gb. Observe behavior at Puts dashboard.

Scenario 4: compaction

Compaction docs.

Explore sequence diagram for Compaction.

Stop previous cluster and start new:

docker-compose -f docker-compose-etcd.yml -f docker-compose-metrics.yml up --force-recreate -V

Run put benchmark with compaction:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --val-size=100 --key-space-size=100000 --total=10000000 --compact-index-delta=10000 --compact-interval=10s

Observe behavior at Compaction dashboard.

Compare database size total/in_use when running put benchmark without compaction:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --val-size=100 --key-space-size=100000 --total=10000000

Generating ErrTooManyRequests (optional)

Run put benchmark with increased value size and compaction

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --val-size=10000 --key-space-size=100000 --total=10000000 --compact-index-delta=1000 --compact-interval=30s

Scenario 5: defragmentation

For explanation of defragmentation process see docs.

Stop previous cluster and start new:

docker-compose -f docker-compose-etcd.yml -f docker-compose-metrics.yml up --force-recreate -V

Increase db size by running put benchmark:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --val-size=10000 --key-space-size=100000 --total=101000

Run compaction once:

docker run --network="host" -it --entrypoint etcdctl --rm docker.io/bkanivets/etcd:v3.5.9 compact 100000 --endpoints=127.0.0.1:2379

Run defrag on non-leader:

docker run --network="host" -it --entrypoint etcdctl --rm docker.io/bkanivets/etcd:v3.5.9 defrag --endpoints=127.0.0.1:32379

Observe behavior at Defrag dashboard.

Run defragmentation with delay

Make sure that put benchmark is running:

docker run --network="host" -it --entrypoint benchmark --rm docker.io/bkanivets/etcd:v3.5.9 --endpoints=127.0.0.1:2379,127.0.0.1:22379,127.0.0.1:32379 --clients=100 --conns=100 put --sequential-keys --val-size=100 --key-space-size=100000 --total=10000000

Add delay:

curl http://127.0.0.1:11180/defragBeforeCopy -XPUT -d'sleep(10000)'
curl http://127.0.0.1:21180/defragBeforeCopy -XPUT -d'sleep(10000)'
curl http://127.0.0.1:31180/defragBeforeCopy -XPUT -d'sleep(10000)'

Run defrag for non-leader:

docker run --network="host" -it --entrypoint etcdctl --rm docker.io/bkanivets/etcd:v3.5.9 defrag --endpoints=127.0.0.1:32379

Observe behavior at Defrag dashboard. Check out grpc error rate.

Run defrag for leader:

docker run --network="host" -it --entrypoint etcdctl --rm docker.io/bkanivets/etcd:v3.5.9 defrag --endpoints=127.0.0.1:2379

Glossary

  • Compaction : etcd keeps an exact history of its keyspace, the process of compacting the keyspace history to drop all information about keys superseded prior to a given keyspace revision is compaction.
  • KV : Key-Value pair.
  • WAL : Write-Ahead Log.
  • MVCC : Multi-Version Concurrency Control.
  • boltDB : bolt database - backend used by etcd, more info here.
  • raft : Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault-tolerance and performance. More info here
  • txn : Transaction, a compound operation type in etcd, that can be any* combination of Read/Write/Delete.
  • gRPC : gRPC is a modern open source high performance Remote Procedure Call (RPC) framework. More info here.
  • bidi : bi-directional.
  • fsync : fsync() transfers all modified in-core data of the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted.

Citations/References