/spark-redshift

Spark and Redshift integration

Primary LanguageScalaApache License 2.0Apache-2.0

RedshiftInputFormat

Hadoop input format for Redshift tables unloaded with the ESCAPE option.

Usage in Spark Core:

import com.databricks.examples.redshift.input.RedshiftInputFormat

val records = sc.newAPIHadoopFile(
  path,
  classOf[RedshiftInputFormat],
  classOf[java.lang.Long],
  classOf[Array[String]])

Usage in Spark SQL:

import com.databricks.examples.redshift.input.RedshiftInputFormat._

// Call redshiftFile() that returns a SchemaRDD with all string columns.
val records: SchemaRDD = sqlContext.redshiftFile(path, Seq("name", "age"))