Tool for selecting and maintaining a subset for Cassandra data for canary testing
Canary testing is a technique to test new software versions in production by exposing a subset of users to the new version. When using Apache Cassandra as database, the distribution of data within the Cassandra cluster might worth considering. This tool connects to a Cassandra cluster, queries its topology and schema metadata and selects canaries based from all keyspaces and tables for all token ranges. Conducting tests on the canary set acquired that way will guarantee that the whole Cassandra cluster is covered.
In the scope of this tool a canary is considered a record consisting of mandatory field Origin
and PrimaryKey
. The Origin
field contains the name of the table the canary was selected from. The PrimaryKey
field contains the primary key of the canary record. The primary key is represented as a map of column names to values.
Optional fields of a canary are reserved and may be add in future versions of the tool.
The tool needs Java-21 or later.
Download and extract any of the distribution bundles from the releases page.
Configuration is done via TypeSafe Config. Place a file at etc/application.conf
to override the defaults. Currently, the only topic covered by the configuration is the Cassandra connection. See the DataStax Java Driver for more details.
Setup a truststore for the client using:
keytool -import -alias CAroot -file ca.crt -keystore etc/truststore.jks
Add the following snippet to your configuration:
datastax-java-driver {
basic {
ssl-engine-factory {
class = DefaultSslEngineFactory
truststore-path = etc/truststore.jks
truststore-password = changeit
}
}
}
The tool is invoked via the vendored wrapper script bin/aviary
from the distribution package. This provides a simple command line interface allowing to select, list and verify canaries.
By default, the tool is connecting to a Cassandra cluster running on localhost
using the default port 9042
. You may add --contact-point=<host>
options to the command line to connect to a different cluster. Using --auth-user=<username>
and --auth-password
allows to authenticate against the cluster. In some use-cases --local-dc=<DC>
and --port=<port>
might be useful.
These options are available for commands connecting to Cassandra.
Having the wrapper script in your PATH
, you can select canaries using:
% aviary select
test.test: 8/16
test.foo: 6/16
foo.multitest: 5/16
This will select canaries for all keyspaces and tables in the Cassandra cluster. The output shows the number of selected canaries and the total number of token ranges for each table.
By default, the selector queries topology and schema metadata of the connected Cassandra cluster and iterates over all keyspaces and tables selecting a single canary from each token range. However, the selector interface allows to customize the selection process. Using the --selector=<SelectorClass>
option allows to specify a custom selector class. The class must implement the io.github.rtib.cassandra.aviary.selector.ICanarySelector
interface. The default selector is io.github.rtib.cassandra.aviary.selector.RangeSelector
.
The set of selected canary entries is stored in avaiary.json
file in the current directory. Output location might be changed using -o <file>
options. The file is overwritten on each invocation of the selector.
The list
command allows to list the canaries stored in the avaiary.json
file:
% aviary list
Canary{Origin=foo.multitest, PrimaryKey={type=test, name=a5, entry=1}}
Canary{Origin=foo.multitest, PrimaryKey={type=test, name=a1, entry=1}}
Canary{Origin=foo.multitest, PrimaryKey={type=test, name=a4, entry=1}}
Canary{Origin=test.test, PrimaryKey={id=5}}
Canary{Origin=foo.multitest, PrimaryKey={type=test, name=a2, entry=1}}
Canary{Origin=test.test, PrimaryKey={id=15}}
Canary{Origin=test.test, PrimaryKey={id=16}}
Canary{Origin=test.test, PrimaryKey={id=17}}
Canary{Origin=test.test, PrimaryKey={id=7}}
Canary{Origin=test.test, PrimaryKey={id=10}}
Canary{Origin=test.foo, PrimaryKey={bar=a7}}
Canary{Origin=test.foo, PrimaryKey={bar=a6}}
Canary{Origin=test.foo, PrimaryKey={bar=a8}}
Canary{Origin=test.foo, PrimaryKey={bar=a2}}
Canary{Origin=test.foo, PrimaryKey={bar=a1}}
Canary{Origin=test.foo, PrimaryKey={bar=a5}}
Canary{Origin=foo.multitest, PrimaryKey={type=test, name=a3, entry=1}}
Canary{Origin=test.test, PrimaryKey={id=9}}
Canary{Origin=test.test, PrimaryKey={id=11}}
The verify
command allows to verify the canaries stored in the avaiary.json
file:
% aviary verify
test.test: 8/8
test.foo: 6/6
foo.multitest: 5/5
This will verify that the canaries are still present in the Cassandra cluster. The output shows the number of verified canaries and the total number of canaries for each table.