/Spark2CassandraBulkLoad

Spark Library for Bulk Loading into Cassandra

Primary LanguageScalaApache License 2.0Apache-2.0

Spark2CassandraBulkLoad

Hits Build Status Download

Spark Library for Bulk Loading into Cassandra

This project refers to Spark2Cassandra

Upgrade utility(spark, cassandra) version.

Features

  1. Convert rdd or dataframe to SSTableFile.
  2. Stream the SSTableFile to Cassandra nodes.

Requirements

Spark2CassandraBulkLoad supports Spark 2.x and above.

Spark2CassandraBulkLoad Version Spark Cassandra connector Version Cassandra Java Driver Version JDK Version
1.X.X [2.0, 2.5) [,4.0) 1.8+

Downloads

SBT

libraryDependencies += "com.joswlv.spark.cassandra.bulk" %% "Spark2CassandraBulkLoad" % "1.0.3"

Maven (JCenter)

<dependency>
	<groupId>com.joswlv.spark.cassandra.bulk</groupId>
	<artifactId>Spark2CassandraBulkLoad</artifactId>
	<version>1.0.3</version>
</dependency>

gradle

compile 'com.joswlv.spark.cassandra.bulk:Spark2CassandraBulkLoad:1.0.3'

Usage

Bulk Loading into Cassandra

// Import the following to have access to the `bulkLoadToCass()` function for RDDs or DataFrames.
import com.joswlv.spark.cassandra.bulk.rdd._
import com.joswlv.spark.cassandra.bulk.sql._

// Specify the `keyspaceName` and the `tableName` to write.
rdd.bulkLoadToCass(
  keyspaceName = "keyspaceName",
  tableName = "tableName"
)

// Specify the `keyspaceName` and the `tableName` to write.
df.bulkLoadToCass(
  keyspaceName = "keyspaceName",
  tableName = "tableName"
)