/spark-extension

A library that provides useful extensions to Apache Spark.

Primary LanguageScalaApache License 2.0Apache-2.0

Spark Extension

This project provides extensions to the Apache Spark project:

  • Diff: A diff transformation for Datasets that computes the differences between two datasets, i.e. which rows to add, delete or change to get from one dataset to the other.

Using Spark Extension

SBT

Add this line to your build.sbt file:

libraryDependencies += "uk.co.gresearch.spark" %% "spark-extension" % "1.0.0"

Maven

Add this dependency to your pom.xml file:

<dependency>
  <groupId>uk.co.gresearch.spark</groupId>
  <artifactId>spark-extension_2.12</artifactId>
  <version>1.0.0</version>
</dependency>