Azure/spark-cdm-connector

Databricks batch mode - AzureCredentialNotFoundException: Could not find ADLS Gen2 Token.

Closed this issue · 2 comments

Hello,

I have a problem with a CDM connector running in a batch mode (workflow).
When running manually it works with no error.
When running as a scheduled task this part of code

entity_df = (spark.read.format("com.microsoft.cdm")
    .option("storage", cdsStorageAccountName)
    .option("manifestPath", cdsContainer + manifest_path)
    .option("entity", table_name)
    .load())
display(entity_df)

throws an error:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException:
Could not find ADLS Gen2 Token

--- Py4JJavaError Traceback (most recent call last)
in
2 .option("storage", cdsStorageAccountName)
3 .option("manifestPath", cdsContainer + manifest_path)
----> 4 .option("entity", table_name)
5 .load()) 6 display(entity_df)

I have checked mounts and they are working normally (using OAuth)
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "",
"fs.azure.account.oauth2.client.secret": "",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com//oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "URL",
mount_point = "/mnt/",
extra_configs = configs)

Cluster:
Spark 2.4.5, Scala 2.11
2-8 Workers28-112 GB Memory8-32 Cores
1 Driver14 GB Memory, 4 Cores
Runtime6.4.x-esr-scala2.11
Option "Enable credential passthrough for user-level data access" is activated.

What could be the reason of this?

P.S: e-mail address asksparkcdm@microsoft.com is not accessible:
The aniketsteam group only accepts messages from people who are within their organization or on their allowed senders list, and your email address is not on the list.

You can't use credential passthrough in non-interactive mode (e.g. in a scheduled task), and that takes precedence over SP credentials provided in Spark config.

I have received a reply from a Databricks team, they have informed that a credentials pass-through cannot be used in a scheduled tasks, so the problem is in a Databricks, not library.