AbsaOSS/atum

Suggestion: Remove support for Scala 2.12+Spark2.4

Closed this issue · 1 comments

Currently, we have 3 build configurations defined in workflows and build-all scripts, these are:

  1. Spark 2.4@Scala 2.11
  2. Spark 2.4@Scala 2.12
  3. Spark 3.1@Scala 2.12

I am suggesting we drop support for the Spark 2.4@Scala 2.12 combination while keeping the other two. My reasoning for this is the following:

  • while options 1 and 3 are quite frequent (gradually shifting from Spark 2.4@Scala 2.11 to Spark 3.1@Scala 2.12), option 2, Spark 2.4@Scala 2.12, is rare (it only exists as 2.4.2 prebuilt, not counting custom builds). So real-life usability of this combination should be minimal.
  • the cross-build process defined in the build-all* scripts effectively replaces the installed binaries for Spark 2.4@Scala 2.12 in the repository with binaries built for Spark 3.1@Scala 2.12, because Atum binaries are only keyed by Scala version, not by Spark version. This makes practical use of the Spark 2.4@Scala 2.12 harder, unless you handpicked it to be built and installed only. This could be solved by changing the artifact naming/versioning Schema to include the spark version, too, if we wanted to.

Let's discuss if there are reasons against it.

Based on internal discussions, agreed on proceeding as outlined - remove the Scala 2.12+Spark2.4 option from both build and from workflows, too.