Azure/spark-cdm-connector

Is there any way to use the cdm connector in Delta Live Tables?

Closed this issue · 1 comments

Did you read the pinned issues and search the error message?

Yes, but I didn't find the answer.

Summary of issue

I am trying to use the CDM Connector to built a DLT on top of it
with the following code:

@dlt.table
def dlt_table():
    appID = ""
    appKey = ""
    tenantID = ""
    return (spark.read.format("com.microsoft.cdm")
                        .option("storage", "sa.dfs.core.windows.net")
                         .option("manifestPath", "cdm/default.manifest.cdm.json")
                          .option("entity", "Account")
                          .option("appId", appID)
                          .option("appKey", appKey)
                          .option("tenantId", tenantID)
                          .load())

when executing the code I get the following Error Message

py4j.protocol.Py4JJavaError: An error occurred while calling o802.load.
: java.lang.ClassNotFoundException: 
Failed to find data source: com.microsoft.cdm. Please find packages at
https://spark.apache.org/third-party-projects.html at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:837)

After a little research I found an article suggesting that it is not possible to load Java Libraries. As the cdm connector is only available as a .jar file, I was wondering whether there is another way to use it in combination with DLTs?

Error stack trace

No response

Platform name

Azure Databricks

Spark version

12.2

CDM jar version

3.3-1.19.5

What is the format of the data you are trying to read/write?

.csv

The library is only offered as a jar, and the error java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.cdm is expected per the documentation you linked.