Databricks batch mode - AzureCredentialNotFoundException: Could not find ADLS Gen2 Token.

Question

Databricks batch mode - AzureCredentialNotFoundException: Could not find ADLS Gen2 Token.

Closed this issue 2 years ago · 2 comments

Hello,

I have a problem with a CDM connector running in a batch mode (workflow).
When running manually it works with no error.
When running as a scheduled task this part of code

entity_df = (spark.read.format("com.microsoft.cdm")
    .option("storage", cdsStorageAccountName)
    .option("manifestPath", cdsContainer + manifest_path)
    .option("entity", table_name)
    .load())
display(entity_df)

throws an error:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException:
Could not find ADLS Gen2 Token

--- Py4JJavaError Traceback (most recent call last)
in
2 .option("storage", cdsStorageAccountName)
3 .option("manifestPath", cdsContainer + manifest_path)
----> 4 .option("entity", table_name)
5 .load()) 6 display(entity_df)

I have checked mounts and they are working normally (using OAuth)
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "",
"fs.azure.account.oauth2.client.secret": "",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com//oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "URL",
mount_point = "/mnt/",
extra_configs = configs)

Cluster:
Spark 2.4.5, Scala 2.11
2-8 Workers28-112 GB Memory8-32 Cores
1 Driver14 GB Memory, 4 Cores
Runtime6.4.x-esr-scala2.11
Option "Enable credential passthrough for user-level data access" is activated.

What could be the reason of this?

P.S: e-mail address asksparkcdm@microsoft.com is not accessible:
The aniketsteam group only accepts messages from people who are within their organization or on their allowed senders list, and your email address is not on the list.

Answer 1 · 2022-11-08T20:47:31.000Z

You can't use credential passthrough in non-interactive mode (e.g. in a scheduled task), and that takes precedence over SP credentials provided in Spark config.

Answer 2 · 2022-11-09T12:37:52.000Z

I have received a reply from a Databricks team, they have informed that a credentials pass-through cannot be used in a scheduled tasks, so the problem is in a Databricks, not library.