Invalid keystore format exception on using truststore for cassandra ssl connection
archcode01 opened this issue · 3 comments
Hi,
We are planning to use the migrator for migration from cassandra to scylladb. Cassandra is deployed on dev environment in a 3 pod cluster and similarly scylla is also a 3 pod cluster deployed using Scylla Operator.
Cassandra set up is a ssl connection and expects truststore details in the client connection requests. This trust store is stored in a kubernetes secret and mounted on the pods as volumes.
This is being used successfully in all other clients who are connecting to cassandra from the same environment.
Spark setup is also on the same k8s cluster deployed using helm chart. This worked perfectly fine for one of other usecases.
The scylladb migrator code is built as per the documentation and the jar is copied on to the spark master pod.
The truststore secret is also mounted on the spark master and is available on a local path to the pod.
Below is the config.yaml that we use :
# Example configuration for migrating from Cassandra:
source:
type: cassandra
host: cassandra.cassandra.svc.cluster.local
port: 9042
#optional, if not specified None will be used
localDC: datacenter1
credentials:
username: cassandra
password: test123
# SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
sslOptions:
clientAuthEnabled: false
enabled: true
# all below are optional! (generally just trustStorePassword and trustStorePath is needed)
trustStorePassword: test123
trustStorePath: /etc/config/tls/cassandra/client/truststore
# trustStoreType: JKS
# keyStorePassword: <keyStorePwd>
# keyStorePath: <keyStorePath>
# keyStoreType: JKS
enabledAlgorithms:
- TLS_RSA_WITH_AES_128_CBC_SHA
- TLS_RSA_WITH_AES_256_CBC_SHA
# protocol: TLS
keyspace: janusgraph
table: system_properties
# Consistency Level for the source connection
# Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
# Connector driver default is LOCAL_ONE. Our recommendation is LOCAL_QUORUM.
# If using ONE or LOCAL_ONE, ensure the source system is fully repaired.
consistencyLevel: LOCAL_QUORUM
# Preserve TTLs and WRITETIMEs of cells in the source database. Note that this
# option is *incompatible* when copying tables with collections (lists, maps, sets).
preserveTimestamps: true
# Number of splits to use - this should be at minimum the amount of cores
# available in the Spark cluster, and optimally more; higher splits will lead
# to more fine-grained resumes. Aim for 8 * (Spark cores).
splitCount: 256
# Number of connections to use to Cassandra when copying
connections: 8
# Number of rows to fetch in each read
fetchSize: 1000
# Optional condition to filter source table data that will be migrated
# where: race_start_date = '2015-05-27' AND race_end_date = '2015-05-27'
# Example for loading from Parquet:
# source:
# type: parquet
# path: s3a://bucket-name/path/to/parquet-directory
# # Optional AWS access/secret key for loading from S3.
# # This section can be left out if running on EC2 instances that have instance profiles with the
# # appropriate permissions. Assuming roles is not supported currently.
# credentials:
# accessKey:
# secretKey:
# Example for loading from DynamoDB:
# source:
# type: dynamodb
# table: <table name>
# # Optional - load from a custom endpoint:
# endpoint:
# # Specify the hostname without a protocol
# host: <host>
# port: <port>
#
# # Optional - specify the region:
# # region: <region>
#
# # Optional - static credentials:
# credentials:
# accessKey: <user>
# secretKey: <pass>
#
# # below controls split factor
# scanSegments: 1
#
# # throttling settings, set based on your capacity (or wanted capacity)
# readThroughput: 1
#
# # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
# # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
# # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
# # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
# throughputReadPercent: 1.0
#
# # how many tasks per executor?
# maxMapTasks: 1
#
# # When transferring DynamoDB sources to DynamoDB targets (such as other DynamoDB tables or Alternator tables),
# # the migrator supports transferring live changes occuring on the source table after transferring an initial
# # snapshot. This is done using DynamoDB streams and incurs additional charges due to the Kinesis streams created.
# # Enable this flag to transfer live changes after transferring an initial snapshot. The migrator will continue
# # replicating changes endlessly; it must be stopped manually.
# #
# # NOTE: For the migration to be performed losslessly, the initial snapshot transfer must complete within 24 hours.
# # Otherwise, some captured changes may be lost due to the retention period of the table's stream.
# #
# # NOTE2: The migrator does not currently delete the created Dynamo stream. Delete it manually after ending the
# # migrator run.
# streamChanges: false
# Configuration for the database you're copying into
target:
type: scylla
host: simple-cluster-client.cassandra.cluster.svc.local
port: 9042
#optional, if not specified None will be used
localDC: datacenter1
credentials:
username: cassandra
password: testcassandra123
# SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
#sslOptions:
# clientAuthEnabled: false
# enabled: false
# all below are optional! (generally just trustStorePassword and trustStorePath is needed)
# trustStorePassword: <pass>
# trustStorePath: <path>
# trustStoreType: JKS
# keyStorePassword: <pass>
# keyStorePath: <path>
# keyStoreType: JKS
# enabledAlgorithms:
# - TLS_RSA_WITH_AES_128_CBC_SHA
# - TLS_RSA_WITH_AES_256_CBC_SHA
# protocol: TLS
# NOTE: The destination table must have the same schema as the source table.
# If you'd like to rename columns, that's ok - see the renames parameter below.
keyspace: janusgraph
table: system_properties
# Consistency Level for the target connection
# Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
# Connector driver default is LOCAL_QUORUM.
consistencyLevel: LOCAL_QUORUM
# Number of connections to use to Scylla when copying
connections: 16
# Spark pads decimals with zeros appropriate to their scale. This causes values
# like '3.5' to be copied as '3.5000000000...' to the target. There's no good way
# currently to preserve the original value, so this flag can strip trailing zeros
# on decimal values before they are written.
stripTrailingZerosForDecimals: false
# if we cannot persist timestamps (so preserveTimestamps==false)
# we can enforce in writer a single TTL or writetimestamp for ALL written records
# such writetimestamp can be e.g. set to time BEFORE starting dual writes
# and this will make your migration safe from overwriting dual write
# even for collections
# ALL rows written will get the same TTL or writetimestamp or both
# (you can uncomment just one of them, or all or none)
# TTL in seconds (sample 7776000 is 90 days)
#writeTTLInS: 7776000
# writetime in microseconds (sample 1640998861000 is Saturday, January 1, 2022 2:01:01 AM GMT+01:00 )
#writeWritetimestampInuS: 1640998861000
# Example for loading into a DynamoDB target (for example, Scylla's Alternator):
# target:
# type: dynamodb
# table: <table name>
# # Optional - write to a custom endpoint:
# endpoint:
# # If writing to Scylla Alternator, prefix the hostname with 'http://'.
# host: <host>
# port: <port>
#
# # Optional - specify the region:
# # region: <region>
#
# # Optional - static credentials:
# credentials:
# accessKey: <user>
# secretKey: <pass>
#
# # Split factor for reading/writing. This is required for Scylla targets.
# scanSegments: 1
#
# # throttling settings, set based on your capacity (or wanted capacity)
# readThroughput: 1
#
# # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
# # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
# # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
# # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
# throughputReadPercent: 1.0
#
# # how many tasks per executor?
# maxMapTasks: 1
# Savepoints are configuration files (like this one), saved by the migrator as it
# runs. Their purpose is to skip token ranges that have already been copied. This
# configuration only applies when copying from Cassandra/Scylla.
savepoints:
# Where should savepoint configurations be stored? This is a path on the host running
# the Spark driver - usually the Spark master.
path: /app/savepoints
# Interval in which savepoints will be created
intervalSeconds: 300
# Column renaming configuration. If you'd like to rename any columns, specify them like so:
# - from: source_column_name
# to: dest_column_name
renames: []
# Which token ranges to skip. You shouldn't need to fill this in normally; the migrator will
# create a savepoint file with this filled.
skipTokenRanges: []
# Configuration section for running the validator. The validator is run manually (see README)
# and currently only supports comparing a Cassandra source to a Scylla target.
validation:
# Should WRITETIMEs and TTLs be compared?
compareTimestamps: true
# What difference should we allow between TTLs?
ttlToleranceMillis: 60000
# What difference should we allow between WRITETIMEs?
writetimeToleranceMillis: 1000
# How many differences to fetch and print
failuresToFetch: 100
# What difference should we allow between floating point numbers?
floatingPointTolerance: 0.001
# What difference in ms should we allow between timestamps?
timestampMsTolerance: 0
After copying the jar and the config.yaml file, we submit the spark job as per the documentation and then we get the below exception.
./spark-submit --class com.scylladb.migrator.Migrator --master spark://myspark-master-svc:7077 --conf spark.scylla.config=/tmp/config.yaml /tmp/scylla-migrator-assembly-0.0.1.jar
Exception
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/06/22 08:52:24 INFO SparkContext: Running Spark version 2.4.4
23/06/22 08:52:24 INFO SparkContext: Submitted application: scylla-migrator
23/06/22 08:52:24 INFO SecurityManager: Changing view acls to: spark
23/06/22 08:52:24 INFO SecurityManager: Changing modify acls to: spark
23/06/22 08:52:24 INFO SecurityManager: Changing view acls groups to:
23/06/22 08:52:24 INFO SecurityManager: Changing modify acls groups to:
23/06/22 08:52:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
23/06/22 08:52:24 INFO Utils: Successfully started service 'sparkDriver' on port 35429.
23/06/22 08:52:24 INFO SparkEnv: Registering MapOutputTracker
23/06/22 08:52:24 INFO SparkEnv: Registering BlockManagerMaster
23/06/22 08:52:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/06/22 08:52:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/06/22 08:52:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7bb2f1d7-3dcc-45cf-8431-a264eb4f9843
23/06/22 08:52:24 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
23/06/22 08:52:24 INFO SparkEnv: Registering OutputCommitCoordinator
23/06/22 08:52:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/06/22 08:52:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://myspark-master-0.myspark-headless.cassandra.svc.cluster.local:4040
23/06/22 08:52:25 INFO SparkContext: Added JAR file:/tmp/scylla-migrator-assembly-0.0.1.jar at spark://myspark-master-0.myspark-headless.cassandra.svc.cluster.local:35429/jars/scylla-migrator-assembly-0.0.1.jar with timestamp 1687423945113
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://myspark-master-svc:7077...
23/06/22 08:52:25 INFO TransportClientFactory: Successfully created connection to myspark-master-svc/10.0.110.171:7077 after 32 ms (0 ms spent in bootstraps)
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230622085225-0000
23/06/22 08:52:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34721.
23/06/22 08:52:25 INFO NettyBlockTransferService: Server created on myspark-master-0.myspark-headless.cassandra.svc.cluster.local:34721
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230622085225-0000/0 on worker-20230622082658-10.12.0.86-33981 (10.12.0.86:33981) with 2 core(s)
23/06/22 08:52:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20230622085225-0000/0 on hostPort 10.12.0.86:33981 with 2 core(s), 1024.0 MB RAM
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230622085225-0000/1 on worker-20230622082739-10.12.0.51-39817 (10.12.0.51:39817) with 2 core(s)
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: Granted executor ID app-20230622085225-0000/1 on hostPort 10.12.0.51:39817 with 2 core(s), 1024.0 MB RAM
23/06/22 08:52:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO BlockManagerMasterEndpoint: Registering block manager myspark-master-0.myspark-headless.cassandra.svc.cluster.local:34721 with 366.3 MB RAM, BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, myspark-master-0.myspark-headless.cassandra.svc.cluster.local, 34721, None)
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230622085225-0000/0 is now RUNNING
23/06/22 08:52:25 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
23/06/22 08:52:25 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230622085225-0000/1 is now RUNNING
23/06/22 08:52:27 INFO migrator: Loaded config: MigratorConfig(Cassandra(cassandra.cassandra.svc.cluster.local,9042,Some(datacenter1),Some(Credentials(cassandra,cassa@2@2!)),Some(SSLOptions(false,true,Some(Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)),None,None,None,None,Some(cassa@2@2!),Some(/etc/config/tls/gremlin/client/truststore),None)),janusgraph,system_properties,Some(256),Some(8),1000,true,None,LOCAL_QUORUM),Scylla(simple-cluster-client.cassandra.cluster.svc.local,9042,Some(datacenter1),Some(Credentials(cassandra,cassandra)),None,janusgraph,system_properties,Some(16),false,None,None,LOCAL_QUORUM),List(),Savepoints(300,/app/savepoints),Set(),Validation(true,60000,1000,100,0.001,0))
23/06/22 08:52:27 INFO Cassandra: Using consistencyLevel [LOCAL_QUORUM] for SOURCE based on source config [LOCAL_QUORUM]
Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {cassandra.cassandra.svc.cluster.local:9042} :: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:181)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:169)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:169)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:89)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
at com.scylladb.migrator.readers.Cassandra$.readDataframe(Cassandra.scala:231)
at com.scylladb.migrator.Migrator$.main(Migrator.scala:47)
at com.scylladb.migrator.Migrator.main(Migrator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:236)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:94)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslEngineFactory(DefaultDriverContext.java:409)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.lambda$new$4(DefaultDriverContext.java:281)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslEngineFactory(DefaultDriverContext.java:764)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslHandlerFactory(DefaultDriverContext.java:468)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslHandlerFactory(DefaultDriverContext.java:818)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:326)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1000(DefaultSession.java:280)
at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:126)
at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
at com.datastax.oss.driver.shaded.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
at com.datastax.oss.driver.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at com.datastax.oss.driver.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cannot initialize SSL Context
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:229)
... 18 more
Caused by: java.io.IOException: Invalid keystore format
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:663)
at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56)
at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224)
at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
at java.security.KeyStore.load(KeyStore.java:1445)
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.buildContext(DefaultSslEngineFactory.java:126)
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:72)
... 23 more```
We have followed all the steps as per the documentation. There is no keystore used or required to connect to cassandra.
But still we are get the above exception.
Can some please help or direct in the correct direction?
config.yaml:
# Example configuration for migrating from Cassandra:
source:
type: cassandra
host: cassandra.cassandra.svc.cluster.local
port: 9042
#optional, if not specified None will be used
localDC: datacenter1
credentials:
username: cassandra
password: test123
# SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
sslOptions:
clientAuthEnabled: false
enabled: true
# all below are optional! (generally just trustStorePassword and trustStorePath is needed)
trustStorePassword: test123
trustStorePath: /etc/config/tls/cassandra/client/truststore
# trustStoreType: JKS
# keyStorePassword: <keyStorePwd>
# keyStorePath: <keyStorePath>
# keyStoreType: JKS
enabledAlgorithms:
- TLS_RSA_WITH_AES_128_CBC_SHA
- TLS_RSA_WITH_AES_256_CBC_SHA
# protocol: TLS
keyspace: janusgraph
table: system_properties
# Consistency Level for the source connection
# Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
# Connector driver default is LOCAL_ONE. Our recommendation is LOCAL_QUORUM.
# If using ONE or LOCAL_ONE, ensure the source system is fully repaired.
consistencyLevel: LOCAL_QUORUM
# Preserve TTLs and WRITETIMEs of cells in the source database. Note that this
# option is *incompatible* when copying tables with collections (lists, maps, sets).
preserveTimestamps: true
# Number of splits to use - this should be at minimum the amount of cores
# available in the Spark cluster, and optimally more; higher splits will lead
# to more fine-grained resumes. Aim for 8 * (Spark cores).
splitCount: 256
# Number of connections to use to Cassandra when copying
connections: 8
# Number of rows to fetch in each read
fetchSize: 1000
# Optional condition to filter source table data that will be migrated
# where: race_start_date = '2015-05-27' AND race_end_date = '2015-05-27'
# Example for loading from Parquet:
# source:
# type: parquet
# path: s3a://bucket-name/path/to/parquet-directory
# # Optional AWS access/secret key for loading from S3.
# # This section can be left out if running on EC2 instances that have instance profiles with the
# # appropriate permissions. Assuming roles is not supported currently.
# credentials:
# accessKey:
# secretKey:
# Example for loading from DynamoDB:
# source:
# type: dynamodb
# table: <table name>
# # Optional - load from a custom endpoint:
# endpoint:
# # Specify the hostname without a protocol
# host: <host>
# port: <port>
#
# # Optional - specify the region:
# # region: <region>
#
# # Optional - static credentials:
# credentials:
# accessKey: <user>
# secretKey: <pass>
#
# # below controls split factor
# scanSegments: 1
#
# # throttling settings, set based on your capacity (or wanted capacity)
# readThroughput: 1
#
# # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
# # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
# # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
# # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
# throughputReadPercent: 1.0
#
# # how many tasks per executor?
# maxMapTasks: 1
#
# # When transferring DynamoDB sources to DynamoDB targets (such as other DynamoDB tables or Alternator tables),
# # the migrator supports transferring live changes occuring on the source table after transferring an initial
# # snapshot. This is done using DynamoDB streams and incurs additional charges due to the Kinesis streams created.
# # Enable this flag to transfer live changes after transferring an initial snapshot. The migrator will continue
# # replicating changes endlessly; it must be stopped manually.
# #
# # NOTE: For the migration to be performed losslessly, the initial snapshot transfer must complete within 24 hours.
# # Otherwise, some captured changes may be lost due to the retention period of the table's stream.
# #
# # NOTE2: The migrator does not currently delete the created Dynamo stream. Delete it manually after ending the
# # migrator run.
# streamChanges: false
# Configuration for the database you're copying into
target:
type: scylla
host: simple-cluster-client.cassandra.cluster.svc.local
port: 9042
#optional, if not specified None will be used
localDC: datacenter1
credentials:
username: cassandra
password: testcassandra123
# SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
#sslOptions:
# clientAuthEnabled: false
# enabled: false
# all below are optional! (generally just trustStorePassword and trustStorePath is needed)
# trustStorePassword: <pass>
# trustStorePath: <path>
# trustStoreType: JKS
# keyStorePassword: <pass>
# keyStorePath: <path>
# keyStoreType: JKS
# enabledAlgorithms:
# - TLS_RSA_WITH_AES_128_CBC_SHA
# - TLS_RSA_WITH_AES_256_CBC_SHA
# protocol: TLS
# NOTE: The destination table must have the same schema as the source table.
# If you'd like to rename columns, that's ok - see the renames parameter below.
keyspace: janusgraph
table: system_properties
# Consistency Level for the target connection
# Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
# Connector driver default is LOCAL_QUORUM.
consistencyLevel: LOCAL_QUORUM
# Number of connections to use to Scylla when copying
connections: 16
# Spark pads decimals with zeros appropriate to their scale. This causes values
# like '3.5' to be copied as '3.5000000000...' to the target. There's no good way
# currently to preserve the original value, so this flag can strip trailing zeros
# on decimal values before they are written.
stripTrailingZerosForDecimals: false
# if we cannot persist timestamps (so preserveTimestamps==false)
# we can enforce in writer a single TTL or writetimestamp for ALL written records
# such writetimestamp can be e.g. set to time BEFORE starting dual writes
# and this will make your migration safe from overwriting dual write
# even for collections
# ALL rows written will get the same TTL or writetimestamp or both
# (you can uncomment just one of them, or all or none)
# TTL in seconds (sample 7776000 is 90 days)
#writeTTLInS: 7776000
# writetime in microseconds (sample 1640998861000 is Saturday, January 1, 2022 2:01:01 AM GMT+01:00 )
#writeWritetimestampInuS: 1640998861000
# Example for loading into a DynamoDB target (for example, Scylla's Alternator):
# target:
# type: dynamodb
# table: <table name>
# # Optional - write to a custom endpoint:
# endpoint:
# # If writing to Scylla Alternator, prefix the hostname with 'http://'.
# host: <host>
# port: <port>
#
# # Optional - specify the region:
# # region: <region>
#
# # Optional - static credentials:
# credentials:
# accessKey: <user>
# secretKey: <pass>
#
# # Split factor for reading/writing. This is required for Scylla targets.
# scanSegments: 1
#
# # throttling settings, set based on your capacity (or wanted capacity)
# readThroughput: 1
#
# # The value of dynamodb.throughput.read.percent can be between 0.1 and 1.5, inclusively.
# # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
# # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
# # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
# throughputReadPercent: 1.0
#
# # how many tasks per executor?
# maxMapTasks: 1
# Savepoints are configuration files (like this one), saved by the migrator as it
# runs. Their purpose is to skip token ranges that have already been copied. This
# configuration only applies when copying from Cassandra/Scylla.
savepoints:
# Where should savepoint configurations be stored? This is a path on the host running
# the Spark driver - usually the Spark master.
path: /app/savepoints
# Interval in which savepoints will be created
intervalSeconds: 300
# Column renaming configuration. If you'd like to rename any columns, specify them like so:
# - from: source_column_name
# to: dest_column_name
renames: []
# Which token ranges to skip. You shouldn't need to fill this in normally; the migrator will
# create a savepoint file with this filled.
skipTokenRanges: []
# Configuration section for running the validator. The validator is run manually (see README)
# and currently only supports comparing a Cassandra source to a Scylla target.
validation:
# Should WRITETIMEs and TTLs be compared?
compareTimestamps: true
# What difference should we allow between TTLs?
ttlToleranceMillis: 60000
# What difference should we allow between WRITETIMEs?
writetimeToleranceMillis: 1000
# How many differences to fetch and print
failuresToFetch: 100
# What difference should we allow between floating point numbers?
floatingPointTolerance: 0.001
# What difference in ms should we allow between timestamps?
timestampMsTolerance: 0
This bug is reported as a mistake. There is no problem with the library. It works as expected.
The problem was the java version was not compatible with the trust store. After upgrading to compatible java version this problem no more occured.
Please execuse for the this bug created. IF there is a way to delete it, this can be deleted as well.
Just a note, the java exception that is thrown in this case mentions the Keystore is not valid format although it is working with truststore. This is problem with the java code.