This project provides extensions to the Apache Spark project:
- Diff: A
diff
transformation forDataset
s that computes the differences between two datasets, i.e. which rows to add, delete or change to get from one dataset to the other.
Add this line to your build.sbt
file:
libraryDependencies += "uk.co.gresearch.spark" %% "spark-extension" % "1.0.0"
Add this dependency to your pom.xml
file:
<dependency>
<groupId>uk.co.gresearch.spark</groupId>
<artifactId>spark-extension_2.12</artifactId>
<version>1.0.0</version>
</dependency>