Azure/spark-cdm-connector

Notice / Common Errors

Closed this issue · 0 comments

Third Party Platforms

The jar provided in releases is originally built and designed for Azure Synapse. The connector is free to use and usage with third party applications will be provided "as-is" with no guarantee of support or it working with your platform. However, the current code is open sourced for contributions should you feel there are improvements that can be made.

All releases will be made here https://github.com/Azure/spark-cdm-connector/releases and not on Maven.

Example

If you want to use Databricks then, you will have to build the jar or use the jars we built in the releases. Credential passthrough will not work. As mentioned in #108, "I have received a reply from a Databricks team, they have informed that a credentials pass-through cannot be used in a scheduled tasks, so the problem is in Databricks, not library."

Credential passthrough

As referenced in #134, credential passthrough is a Synapse specific feature. Use app registration or SAS token auth if you are not using Synapse.

CDM Connector save change

If you upgrade from Spark 2 to Spark 3, the CDM connector save behaves differently. If the entity or the manifest does not exist, when doing a dataframe write operation with SaveMode.Append or SaveMode.Overwrite, it will throw something like NoSuchTableException: Manifest doesn't exist. root.manifest.cdm.json.

Solution is to remove the .mode(SaveMode.Append/Overwrite) option.

Jar "doesn't work" / java.lang.NoClassDefFoundError

See the first point. If you have a Spark cluster and then got similar errors, then you probably used the wrong verson. Some of the said classes exclusively exist within Spark 2 or 3, therefore you get an error. Below table shows the mapping of jar version to Spark version. Run this Scala code com.microsoft.cdm.BuildInfo.version if you don't know what version of the CDM connector you have.

CDM Version Spark Version
0.x 2.4
spark3.1-1.x 3.1.x
spark3.2-1.x 3.2.x

Example scenarios:

  • Spark 3 jar with Spark 2 cluster
    java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
    
  • Spark 2 jar with Spark 3 cluster
    java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/SupportsCatalogOptions
    

Reading a table gives: java.util.NoSuchElementException

See: Spark 3.3: Reading a table gives: java.util.NoSuchElementException: None.get #138.