holdenk/spark-testing-base

Codec [lz4] is not available. Consider setting spark.io.compression.codec=snappy

surajnakka opened this issue · 0 comments

When using DatasetSuiteBase in unit tests and doing operations in parallel on dataframes, getting the below error. If we remove the parallel operations on the dataframes, it works well.

[info] java.lang.IllegalArgumentException: Codec [lz4] is not available. Consider setting spark.io.compression.codec=snappy
[info] at org.apache.spark.io.CompressionCodec$.$anonfun$createCodec$2(CompressionCodec.scala:92)
[info] at scala.Option.getOrElse(Option.scala:189)
[info] at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:91)
[info] at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:78)
[info] at org.apache.spark.sql.execution.SparkPlan.decodeUnsafeRows(SparkPlan.scala:369)
[info] at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:497)
[info] at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:429)
[info] at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
[info] at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715)
[info] at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728)

Since DatasetSuiteBase automatically gives us a spark context we are not able to set the spark.io.compression.codec to snappy as it has to be set before the spark context is created.