Author: Patrick Hunt (follow me on twitter)
This project provides a simple smoketest client for a ZooKeeper ensemble. Useful for verifying new, updated, existing installations.
I’ve recently added a new zk-latencies.py client, which tests latencies of various operations against the servers in the cluster. This is useful when you setup a new cluster and wish to ensure it will be adequate to the task (in particular operation latencies). Try varying the numbers and sizes of the znodes used in the test, also compare asynchronous latencies vs synchronous (will give you insight into your network latencies).
Note: python’s “time” library is used to time the operations, on some systems the resolution of the timer may be as low as 1 second, depending on the python interpreter implementation for that platform.
From the official site: “ZooKeeper is a high-performance coordination service for distributed applications.”
It exposes common services – such as naming, configuration management, synchronization, and group services – in a simple interface so you don’t have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols.
This project is licensed under the Apache License Version 2.0
The examples below (PYTHONPATH/LD_LIBRARY_PATH) use the included ZooKeeper client c libraries. Modern distributions often include these as part of their packaging. Alternately you can use the following instead:
apt-get install python-dev
apt-get install libzookeeper-mt-dev
pip install zkpython
This tool uses the ZooKeeper python binding to exercise the server. In general the script does the following:
- create a root znode for the test, i.e. /zk-smoketest
- attach a zk session to each server in the ensemble (the —servers list)
- each client will create ephemeral znodes and attach watches on znodes of the other clients
- each client will then delete the the ephemeral znodes it owns, all clients verify watches are triggered properly
- client then cleans up, removing /zk-smoketest znode
This project is a work in progress – so please don’t hesitate to suggest ideas, complain about problems found, or contribute patches.
Usage: zk-smoketest.py [options] Options: -h, --help show this help message and exit --servers=SERVERS comma separated list of host:port (default localhost:2181) --config=CONFIGFILE zookeeper configuration file to lookup servers from --timeout=TIMEOUT session timeout in milliseconds (default 5000) -v, --verbose verbose output, include more detail -q, --quiet quiet output, basically just success/failure
The exit code is 0 on success, non-0 exit code on failure.
Verbose output includes timing information for some critical operations (connection establishment, creating/deleting nodes, etc…).
PYTHONPATH=lib.linux-i686-2.6 LD_LIBRARY_PATH=lib.linux-i686-2.6 ./zk-smoketest.py --servers "host:port(,host:port)*"
Note: you may need to compile the zookeeper python binding yourself, this project includes only 32/64-bit Linux binaries. Additionally, the smoketest scripts rely on some changes to the zkpython binding that are not yet released, so if you do compile yourself you will need to compile zkpython from the Apache ZooKeeper SVN trunk (this should be addressed as soon as ZooKeeper 3.3.0 is released).
On success the last line of output will be “Smoke test successful”, otw an exception will be raised.
ZooKeeper client output is written to “cli_log_%d.txt” where %d is the process id of the python process.
Say you have a ZooKeeper ensemble with 5 servers (host1,host2,host3,host4,host5, where each host is running ZK with a client port of 2181):
PYTHONPATH=lib.linux-i686-2.6 LD_LIBRARY_PATH=lib.linux-i686-2.6 ./zk-smoketest.py --servers "host1:2181,host2:2181,host3:2181,host4:2181,host5:2181"
or, for a ZK-style configuration file:
PYTHONPATH=lib.linux-i686-2.6 LD_LIBRARY_PATH=lib.linux-i686-2.6 ./zk-smoketest.py --config zk.conf
This tool uses the ZooKeeper python binding to test various operation latencies. In general the script does the following:
- create a root znode for the test, i.e. /zk-latencies
- attach a zk session to each server in the ensemble (the —servers list)
- run various (create/get/set/delete) operations against each server, note the latencies of operations
- client then cleans up, removing /zk-latencies znode
By default asynchronous operations are run, specify the “—synchronous” flag if you want synchronous operations to be used. Typically the difference between the two will give you insight into your network latency.
Typically the difference between the two will give you insight into your network latency. def asynchronous_latency_test(s, data): # set znode_count znodes def func(): callbacks = [] for j in xrange(options.znode_count): cb = zkclient.SetCallback() cb.cv.acquire() s.aset(child_path(j), cb, data) callbacks.append(cb) for cb in callbacks: cb.waitForSuccess() timer2(func, "set %7d znodes " % (options.znode_count)) Connecting to 202.45.128.173:2181 Connected in 16 ms, handle is 0 Testing latencies on server 202.45.128.173:2181 using asynchronous calls created 10000 permanent znodes in 552 ms (0.055206 ms/op 18113.937652/sec) set 10000 znodes in 563 ms (0.056307 ms/op 17759.659026/sec) get 10000 znodes in 382 ms (0.038295 ms/op 26112.853640/sec) deleted 10000 permanent znodes in 237 ms (0.023747 ms/op 42109.710233/sec) created 10000 ephemeral znodes in 739 ms (0.073921 ms/op 13527.957526/sec) watched 10000 znodes in 431 ms (0.043104 ms/op 23199.918579/sec) deleted 10000 ephemeral znodes in 297 ms (0.029790 ms/op 33568.074065/sec) notif 10000 watches in 0 ms (included in prior) Latency test complete def synchronous_latency_test(s, data): # set znode_count znodes timer((s.set(child_path(j), data) for j in xrange(options.znode_count)), "set %7d znodes " % (options.znode_count)) Connecting to 202.45.128.173:2181 Connected in 15 ms, handle is 0 Testing latencies on server 202.45.128.173:2181 using syncronous calls created 10000 permanent znodes in 32384 ms (3.238494 ms/op 308.785484/sec) set 10000 znodes in 31845 ms (3.184508 ms/op 314.020252/sec) get 10000 znodes in 8231 ms (0.823185 ms/op 1214.794053/sec) deleted 10000 permanent znodes in 30826 ms (3.082692 ms/op 324.391808/sec) created 10000 ephemeral znodes in 32463 ms (3.246397 ms/op 308.033829/sec) watched 10000 znodes in 9141 ms (0.914148 ms/op 1093.914175/sec) deleted 10000 ephemeral znodes in 31597 ms (3.159710 ms/op 316.484740/sec) notif 10000 watches in 1001 ms (0.100110 ms/op 9988.973392/sec) Latency test complete
This project is a work in progress – so please don’t hesitate to suggest ideas, complain about problems found, or contribute patches.
Usage: zk-latencies.py [options] Options: -h, --help show this help message and exit --servers=SERVERS comma separated list of host:port (default localhost:2181) --cluster=CLUSTER comma separated list of host:port, test as a cluster, alternative to --servers --config=CONFIGFILE zookeeper configuration file to lookup servers from --timeout=TIMEOUT session timeout in milliseconds (default 5000) --root_znode=ROOT_ZNODE root for the test, will be created as part of test (default /zk-latencies) --znode_size=ZNODE_SIZE data size when creating/setting znodes (default 25) --znode_count=ZNODE_COUNT the number of znodes to operate on in each performance section (default 10000) --watch_multiple=WATCH_MULTIPLE number of watches to put on each znode (default 1) --force force the test to run, even if root_znode exists - WARNING! don't run this on a real znode or you'll lose it!!! --synchronous by default asynchronous ZK api is used, this forces synchronous calls -v, --verbose verbose output, include more detail -q, --quiet quiet output, basically just success/failure
The —servers options will test each server in turn while the —cluster option tests a server in the cluster chosen at random.
PYTHONPATH=lib.linux-i686-2.6 LD_LIBRARY_PATH=lib.linux-i686-2.6 ./zk-latencies.py --servers "host:port(,host:port)*"
Note: you may need to compile the zookeeper python binding yourself, even though this project includes both 32bit and 64bit linux binaries. Additionally, the scripts rely on some changes to the zkpython binding that are not yet released, so if you do compile yourself you will need to compile zkpython from the Apache ZooKeeper SVN trunk (this should be addressed as soon as ZooKeeper 3.3.0 is released).
On success the last line of output will be “Latency test complete”, otw an exception will be raised.
ZooKeeper client output is written to “cli_log_%d.txt” where %d is the process id of the python process.
Say you have a ZooKeeper ensemble with 5 servers (host1,host2,host3,host4,host5, where each host is running ZK with a client port of 2181). This runs the latency test with 100 znodes per phase, where the data size is 100 bytes. You’ll want to change the number of znodes and size of data based on your expected requirements.
PYTHONPATH=lib.linux-i686-2.6 LD_LIBRARY_PATH=lib.linux-i686-2.6 ./zk-latencies.py --servers "host1:2181,host2:2181,host3:2181,host4:2181,host5:2181" --znode_count=100 --znode_size=100 --synchronous
or, for a ZK-style configuration file:
PYTHONPATH=lib.linux-i686-2.6 LD_LIBRARY_PATH=lib.linux-i686-2.6 ./zk-latencies.py --config zk.conf --znode_count=100 --znode_size=100 --synchronous
Result:
Testing latencies on server host1:2181 using syncronous calls created 100 permanent znodes in 177 ms (1.772718 ms/op 564.105378/sec) set 100 znodes in 140 ms (1.408131 ms/op 710.161138/sec) get 100 znodes in 22 ms (0.224919 ms/op 4446.038712/sec) deleted 100 permanent znodes in 134 ms (1.347141 ms/op 742.312648/sec) created 100 ephemeral znodes in 231 ms (2.313130 ms/op 432.314674/sec) deleted 100 ephemeral znodes in 127 ms (1.277809 ms/op 782.589486/sec) Testing latencies on server host2:2181 using syncronous calls created 100 permanent znodes in 141 ms (1.417971 ms/op 705.233211/sec) set 100 znodes in 135 ms (1.352711 ms/op 739.256356/sec) get 100 znodes in 19 ms (0.194461 ms/op 5142.410161/sec) deleted 100 permanent znodes in 133 ms (1.333392 ms/op 749.967189/sec) created 100 ephemeral znodes in 133 ms (1.336930 ms/op 747.982431/sec) deleted 100 ephemeral znodes in 128 ms (1.286640 ms/op 777.218080/sec) .... more .... Latency test complete