/aws-keyspace-backup-to-s3

Command line Tool for Keyspace CSV beckups

Primary LanguageJavaApache License 2.0Apache-2.0

aws-keyspace-backup-to-s3

(JAR can be found in build/libs/keystore-backup.jar)

Command line Tool for Keyspace CSV backups

  1. Reading whole table (or provided query) and posting in -in-memory queue.
  2. Storing lines in CSV file and
    1. Uploading in S3 bucket (with multi-part upload)
    2. Storing locally in provided file

AWS keystore config file example:

datastax-java-driver {
basic {
  load-balancing-policy {
    local-datacenter = us-east-2
  }
  contact-points = ["cassandra.us-east-2.amazonaws.com:9142"]
  request {
  page-size = 20000
  timeout = 140 seconds
  consistency = LOCAL_QUORUM
 }
} 
advanced {
 control-connection {
  timeout = 40 seconds
}
 connection {
 connect-timeout = 40 seconds
 init-query-timeout = 40 seconds
}
 auth-provider {
  class = PlainTextAuthProvider
  username = "my-user
  password = "my-pass"
 }
 ssl-engine-factory {
 class = DefaultSslEngineFactory
 truststore-password = "my-truststore-jks-pass"
 truststore-path = "truststore.jks"
 }
 metadata {
 token-map.enabled = false
 schema.enabled = true 
 }
 }
}

Pre req. Java 11

Tool is not fully finished, still development in progress.

Tool arguments (optional)

  -command,--command                                   Skip menu and execute [backup, restore, reinsert, delete] command. Otherwise use interactive menu.
  -fs,--fs-result-path                                 Path where to store files locally.
  -imq,--in-memory-queue                               In memory blocking queue rows size.
  -ims,--in-memory-stream-size-mb                      In memory buffered stream size in MB. If AWS s3 storage will be used, then this size will be one multi part size value in MB.
  -imtime,--in-memory-queue-poll-timeout-secs          In memory buffered stream size building from queue polling timeout in seconds. AWS Keyspace operations wait records in queue timeout (600).
  -kdelb,--keyspace-delete-batch-size                  AWS Keyspace delete batch size (aws max30).
  -ke,--keyspace-empty-to-finish                       AWS Keyspace returned empty pages assume as finished (max int).
  -kerrp,--keyspace-stop-after-error-pages-count       AWS Keyspace stop execution after errored/failed pages fetching.
  -kf,--keyspace-config-file                           AWS Keyspace configuration file path.
  -kk,--keyspace-keyspace                              AWS Keyspace storage 'keyspace'. If query will be provided this value will be ignored.
  -kt,--keyspace-table                                 AWS Keyspace storage 'table'. If query will be provided this value will be ignored.
  -kp,--keyspace-pages-to-skip                         AWS Keyspace pages to skip.
  -kq,--keyspace-query                                 AWS Keyspace data fetching query. Will ignoring keyspace.table if this value provided.
  -kquewt,--keyspace-wait-item-in-queue-mins           AWS Keyspace operations wait records in queue time in minutes (15 min).
  -krate,--keyspace-update-rate-limiter-per-sec        AWS Keyspace modify rate limiter (500!).
  -kthrds,--keyspace-write-thread-counts               AWS Keyspace write (restore/reinsert/delete) threads count (default 8).
  -kttl,--keyspace-reinsert-ttl-value                  AWS Keyspace reinsert ttl value (15552000 = 1y). TTL will be not set on reinsert or restore/insert if value will be les then 1.
  -s3b,--s3-bucket                                     AWS S3 bucket.
  -s3f,--s3-folder                                     AWS S3 folder (object prefix in bucket).
  -s3r,--s3-region                                     AWS S3 bucket region.
  -s3res,--s3-restore-from-csv-key                     AWS S3 file key to to restore from bucket (full path in bucket).
  -s3suf,--s3-store-file-suffix                        AWS S3 file to store suffix (<timestamp>_<suffix>.csv).
  -statheadr,--stat-reprint-header-after-seconds       Statistic header reprinting after seconds.
  -statline,--stat-print-in-new-line-after-secs        Statistic new line printing after seconds.
  -statstop,--stat-print-stop-after-no-changes-secs    Statistic printing stopping after not changes found.
  -stattime,--stat-update-timeout-in-mills             Statistic line refresh timeout in milliseconds.

Example startup command with default arguments

  java -jar keystore-backup.jar \
  -kq "select field,userid,deviceid,date,hour,minute,timestamp from my_keyspace.my_table where timestamp >= '2021-01-01T00:00:00.000Z' and timestamp < '2022-01-01T00:00:00.000Z' ALLOW FILTERING" \
  -kf "keyspace_prd.conf" \
  -kk my_keyspace \
  -kt my_table \
  -fs "v1_dump.csv" \
  -s3b my-buscket \
  -s3f my-data-2021-data \
  -s3r us-east-1

Backup command

  java -jar keystore-backup.jar \
  -command backup \
  --keyspace-config-file "test.conf" \
  --keyspace-keyspace my_keyspace \
  --keyspace-table my_table \
  --s3-bucket mk-app-test \
  --s3-folder small-test \
  --s3-store-file-suffix all-data \
  --s3-region us-east-1 \
  --stat-print-stop-after-no-changes-secs 45 \
  --in-memory-queue-poll-timeout-secs 5

Delete command

  java -jar keystore-backup.jar \
  -command delete \
  --keyspace-config-file "test.conf" \
  --keyspace-keyspace my_keyspace \
  --keyspace-table my_table \
  --in-memory-queue-poll-timeout-secs 60
  --keyspace-update-rate-limiter-per-sec 500
  --stat-print-stop-after-no-changes-secs 125

Restore command (ttl 3 minutes)

  java -jar keystore-backup.jar \
  -command restore \
  --keyspace-config-file "test.conf" \
  --keyspace-keyspace my_keyspace \
  --keyspace-table my_table \
  --s3-bucket mk-app-test \
  --s3-region us-east-1 \
  --s3-restore-from-csv-key /mk-app-test/small-test/my_keyspace/my_table/2022_12_13_09_07_35_all-data.csv \
  --stat-print-stop-after-no-changes-secs 125 \
  --in-memory-queue-poll-timeout-secs 15
  --keyspace-update-rate-limiter-per-sec 500 \
  --keyspace-reinsert-ttl-value 180

Reinsert with TTL command (ttl 3 minutes)

  java -jar keystore-backup.jar \
  -command reinsert \
  --keyspace-config-file "test.conf" \
  --keyspace-keyspace my_keyspace \
  --keyspace-table my_table \
  --stat-print-stop-after-no-changes-secs 125 \
  --in-memory-queue-poll-timeout-secs 15
  --keyspace-update-rate-limiter-per-sec 500 \
  --keyspace-reinsert-ttl-value 180

TODO:

  1. Exception handling clean up
  2. Exception recovery on data fetching
  3. Logs clean up