Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.
Now an official part of the Helm incubator/cassandra chart!
wget -qO- https://github.com/maorfr/cain/releases/download/0.1.0/cain.tar.gz | sudo tar xvz -C /usr/local/bin
Cain uses glide as a dependency management tool, since some of the referenced packages are not available using dep.
glide up
go build -o cain cmd/cain.go
Cain performs a backup in the following way:
- Backup the
keyspace
schema (usingcqlsh
) and copy it to S3. - Get backup data using
nodetool snapshot
- it creates a snapshot of thekeyspace
in all Cassandra pods in the givennamespace
(according toselector
). - Copy the files in
parallel
to S3 using Skbn - it copies the files to the specifieddst
, undernamespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/
. - Clear all snapshots.
$ cain backup --help
backup cassandra cluster to S3
Usage:
cain backup [flags]
Flags:
-c, --container string container name to act on (default "cassandra")
--dst string destination to backup to. Example: s3://bucket/cassandra
-h, --help help for backup
-k, --keyspace string keyspace to act on
-n, --namespace string namespace to find cassandra cluster
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism (default 1)
-l, --selector string selector to filter on
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra \
-p 0
Cain performs a restore in the following way:
- Truncate all tables in
keyspace
. - Copy files from the specified
src
(underkeyspace/<keyspaceSchemaHash>/tag/
) - restore is only possible for the same keyspace schema. - Load new data using
nodetool refresh
.
- Cain does not currently restore the schema (must be loaded restoring).
$ cain restore --help
restore cassandra cluster from S3
Usage:
cain restore [flags]
Flags:
-c, --container string container name to act on (default "cassandra")
-h, --help help for restore
-k, --keyspace string keyspace to act on
-n, --namespace string namespace to find cassandra cluster
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism (default 1)
-l, --selector string selector to filter on
--src string source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name
-t, --tag string tag to restore
cain restore \
--src s3://db-backup/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624 \
-p 0
Cain describes the keyspace
schema using cqlsh
. It can return the schema itself, or a checksum of the schema file (used by backup
and restore
).
$ cain schema --help
get schema of cassandra cluster
Usage:
cain schema [flags]
Flags:
-c, --container string container name to act on (default "cassandra")
-h, --help help for schema
-k, --keyspace string keyspace to act on
-n, --namespace string namespace to find cassandra cluster
-l, --selector string selector to filter on
--sum print only checksum
cain schema \
-n default \
-l release=cassandra \
-k keyspace
cain schema \
-n default \
-l release=cassandra \
-k keyspace \
--sum
Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.
Cain tries to get credentials in the following order:
- if
KUBECONFIG
environment variable is set - skbn will use the current context from that config file - if
~/.kube/config
exists - skbn will use the current context from that config file with an out-of-cluster client configuration - if
~/.kube/config
does not exist - skbn will assume it is working from inside a pod and will use an in-cluster client configuration
Skbn uses the default AWS credentials chain.