AbsaOSS/spark-hofs

Compatibility with Spark 2.4.1

Closed this issue · 1 comments

Describe the bug

Currently 12 test fails when building against Spark 2.4.1.
E.g.

- transform function with anonymous variables and an index *** FAILED ***
  org.apache.spark.sql.AnalysisException: cannot resolve '`elm`' given input columns: [array];;
'Project [transform(array#84, lambdafunction(('elm + 'idx), lambda elm#87, lambda idx#88, false)) AS transform(array, lambdafunction((elm + idx), elm, idx))#86]
+- Project [value#82 AS array#84]
   +- LocalRelation [value#82]
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:110)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:107)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)

To Reproduce

  1. Change Spark version to 2.4.1 in the project pom file.
  2. Run 'mvn clean test'

Expected behaviour

The library should work in Spark 2.4.1

Newest spark-hofs won't be compatible with Spark 2.4.1 since in 2.4.2 a Spark-internal change was introduced the way lambda variables are handled. Unfortunately, this change is breaking for spark-hofs so newer versions of spark-hofs will support only Spark 2.4.3+. To use spark-hofs with older versions of Spark, the older versions of spark-host should be used according to the compatibility table in README.