Azure/spark-cdm-connector

Databricks Spark 2.4: java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/SupportsCatalogOptions

Closed this issue · 1 comments

I have installed a jar library on a Databricks cluster and during read I cannot use CDM connector any more.
Using this line of code
entity_df = (spark.read.format("com.microsoft.cdm") .option("storage", cdsStorageAccountName) .option("manifestPath", cdsContainer + manifest_path) .option("entity", table_name) .load()) display(entity_df)

throws an error:
java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/SupportsCatalogOptions
Py4JJavaError Traceback (most recent call last)
in
2 .option("storage", cdsStorageAccountName)
3 .option("manifestPath", cdsContainer + manifest_path)
----> 4 .option("entity", table_name)
5 .load())
6 display(entity_df)

When using a previous version 0.19.1 it worked with no error.
Databricks cluster configuration:
Apache Spark 2.4.5, Scala 2.11
spark.databricks.passthrough.enabled true
spark.databricks.delta.preview.enabled true
2-8 Workers 28-112 GB Memory 8-32 Cores
1 Driver14 GB Memory, 4 Cores
Runtime 6.4

I think you can check #107. You are using a spark 3 built jar on a spark 2 cluster and the said classes don't exist as expected.