AbsaOSS/ABRiS

Fix tests for Spark 3.5.0

Closed this issue · 0 comments

Describe the bug

Executing tests fails with Spark 3.5.0

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. Check out latest master
  2. Change Spark version in pom.xml to 3.5.0
  3. Run mvn clean test (using Java 8)
  4. See errors below
[ERROR] /ABRiS/src/main/scala/za/co/absa/abris/examples/ConfluentKafkaAvroWriter.scala:88: error: value apply is not a member of object org.apache.spark.sql.catalyst.encoders.RowEncoder
[ERROR]     RowEncoder.apply(sparkSchema)
[ERROR]                ^
[ERROR] one error found

This error can be fixed by replacing RowEncoder.apply with org.apache.spark.sql.Encoders.row

The next error is

SchemaLoaderSpec:
SchemaLoader
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/exc/StreamConstraintsException
  at com.fasterxml.jackson.databind.node.JsonNodeFactory.objectNode(JsonNodeFactory.java:353)
  at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
  at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
  at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
  at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4801)
  at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3084)
  at org.apache.avro.Schema$Parser.parse(Schema.java:1430)
  at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
  at all_types.test.NativeSimpleOuter.<clinit>(NativeSimpleOuter.java:18)
  at za.co.absa.abris.examples.data.generation.TestSchemas$.<init>(TestSchemas.scala:35)

This can be fixed e.g. by explicitly setting the jackson-core dependency to version 2.15.2, thereby overriding v2.12.2 that is included by avro 1.10.2. Spark 3.5.0 depends on jackson-databind v2.15.2

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.15.2</version>
        </dependency>

Expected behavior

The tests should run for the current Spark versions 3.2.4, 3.3.3, 3.4.2 and 3.5.0. These versions should be added in the test-and-verify Github action.

Additional context

If you replace RowEncoder.apply with RowEncoder.encoderFor, the following exception is thrown in 18 tests.

- should convert all types of data to confluent avro an back using schema registry for key *** FAILED ***
  org.apache.spark.SparkRuntimeException: Only expression encoders are supported for now.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedEncoderError(QueryExecutionErrors.scala:477)
  at org.apache.spark.sql.catalyst.encoders.package$.encoderFor(package.scala:34)
  at org.apache.spark.sql.catalyst.plans.logical.CatalystSerde$.generateObjAttr(object.scala:47)
  at org.apache.spark.sql.execution.ExternalRDD$.apply(ExistingRDD.scala:35)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:498)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:367)
  at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:236)
  at za.co.absa.abris.avro.sql.CatalystAvroConversionSpec.getTestingDataFrame(CatalystAvroConversionSpec.scala:55)
  at za.co.absa.abris.avro.sql.CatalystAvroConversionSpec.$anonfun$new$23(CatalystAvroConversionSpec.scala:484)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  ...

The fix is to replace RowEncoder.encoderFor by org.apache.spark.sql.Encoders.row, as mentioned in https://issues.apache.org/jira/browse/SPARK-45311

If you get java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x74ad2091) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x74ad2091, run it with Java 8, or add the VM option --add-exports java.base/sun.nio.ch=ALL-UNNAMED