databrickslabs/dbx

Authentication on azure blob storage

Closed this issue ยท 8 comments

Hey Folks,

I'm having trouble to upload an artifact to a blob storage on azure. I think it is a simple authentication problem, but I can't figure it out and the documentation regarding this problem is sparse. I would very much appreciated a hint.

My project.json looks like the following:

{ "environments": { "default": { "profile": "DEFAULT", "storage_type": "mlflow", "properties": { "workspace_directory": "/Shared/dbx/appfigures-wasbs2", "artifact_location": "wasbs://artifact@xxxxx.blob.core.windows.net/appfigures" } } }, "inplace_jinja_support": false, "failsafe_cluster_reuse_with_assets": false, "context_based_upload_for_execute": false }

and the error is as follows:

DefaultAzureCredential failed to retrieve a token from the included credentials. Attempted credentials: EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured. Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue. ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint. SharedTokenCacheCredential: Azure Active Directory error '(invalid_grant) AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2023-01-03T10:17:27.4356604Z and was inactive for 90.00:00:00. Trace ID: xxxx Correlation ID: xxxx Timestamp: 2023-07-12 13:36:20Z' Content: {"error":"invalid_grant","error_description":"AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2023-01-03T10:17:27.4356604Z and was inactive for 90.00:00:00.\r\nTrace ID:xxxxx\r\nCorrelation ID: xxxxxf\r\nTimestamp: 2023-07-12 13:36:20Z","error_codes":[700082],"timestamp":"2023-07-12 13:36:20Z","trace_id":"2103a8ee-020d-40d4-8348-24dd037c0e00","correlation_id":"ca09f558-dc98-484d-b21c-a36501e08c1f","error_uri":"https://login.microsoftonline.com/error?code=700082"} To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot. DefaultAzureCredential failed to retrieve a token from the included credentials. Attempted credentials: EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured. Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue. ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint. SharedTokenCacheCredential: Azure Active Directory error '(invalid_grant) AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2023-01-03T10:17:27.4356604Z and was inactive for 90.00:00:00.

Thanks again for this great tool!

hi @panoptikum ,

it seems like you're not authenticated to Azure, or the token is expired, or you don't have permissions to upload/download on the provided wasbs path. Could you please try first:

az login 

And then:

dbx <some command>

?

Hi @renardeinside ,

Thank you for your quick reply. I am rather certain that I am authenticated to Azure and I can launch jobs via dbx with dbx launch,, but here is the outcome of the two commands:

az login
A web browser has been opened at https://login.microsoftonline.com/organizations/oauth2/v2.0/authorize. Please continue the login in the web browser. If no web browser is available or if the web browser fails to open, use device code flow with `az login --use-device-code`.
[
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "xxx8",
    "id": "xxx",
    "isDefault": true,
    "managedByTenants": [
      {
        "tenantId": "xxx" 
      }
    ],
    "name": "BigData / AI",
    "state": "Enabled",
    "tenantId": "xxxx",    
    "user": {
      "name": "xxx",
      "type": "user"
    }
  }
]

AND

dbx execute
[dbx][2023-07-17 17:25:13.574] ๐Ÿ”Ž Deployment file is not provided, searching in the conf directory
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ C:\Users\xxx\Miniconda3\lib\runpy.py:197 in _run_module_as_main                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   194 โ”‚   main_globals = sys.modules["__main__"].__dict__                                        โ”‚
โ”‚   195 โ”‚   if alter_argv:                                                                         โ”‚
[...]


FileNotFoundError: Auto-discovery was unable to find any deployment file in the conf directory. Please provide file name via --deployment-file option

Which authentication mode does dbx use and how do I have to configure my storage? Azure recommends the v2 storage, but can I access this type with wasb as well? I am bit confused or wondering if have to create a v1 storage, but that is kind of deprecated by Microsoft.

Thanks again for your help!

hi @panoptikum ,
the error you're seeing is unrelated to azure storage auth now - it says that there is no deployment file (which is the error you'll see both in dbx deploy and dbx execute).

Please pre-read the quickstart doc, as well as the artifact storage explanation for clear picture on how azure storage is used with dbx.

I'll close the ticket as of now, since the root problem has nothing to do with auth on Azure blob.

Hi @renardeinside ,
Sorry I made a mistake. I called the command on the wrong folder where no dbx project resides. I will give an update with the correct output tomorrow. I still believe it is an auth problem.

Hi @renardeinside ,

can you please reconsider or reopen the issue with the following output. Thank you.

One question I have: Do I need to create an azure storoage v1 or does the wasbs protocol also work on v2 storage? I'm asking because I am bit reluctant to use v1 as it is deprecated by Microsoft.

base) PS C:\Users\xxx\Projekte\xxx> dbx deploy
[dbx][2023-07-18 15:06:18.906] ๐Ÿ”Ž Deployment file is not provided, searching in the conf directory
[dbx][2023-07-18 15:06:18.906] ๐Ÿ’ก Auto-discovery found deployment file conf\deployment.yml
[dbx][2023-07-18 15:06:18.906] ๐Ÿ†— Deployment file conf\deployment.yml exists and will be used for deployment
[dbx][2023-07-18 15:06:18.922] Starting new deployment for environment default
[dbx][2023-07-18 15:06:18.937] Using profile provided from the project file
[dbx][2023-07-18 15:06:18.950] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2023-07-18 15:06:18.951] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2023-07-18 15:06:18.952] Profile DEFAULT will be used for deployment
[dbx][2023-07-18 15:06:21.072] No build logic defined in the deployment file. Default pip-based build logic will be used.
[dbx][2023-07-18 15:06:21.072] Reading the build config section first to identify build steps
[dbx][2023-07-18 15:06:21.072] Following the provided build logic
[dbx][2023-07-18 15:06:21.077] ๐Ÿ Building a Python-based project
[dbx][2023-07-18 15:06:21.078] ๐Ÿงน Standard package folder  dist already exists, cleaning it before Python package build
[dbx][2023-07-18 15:06:29.778] โœ… Python-based project build finished
[dbx][2023-07-18 15:06:29.780] ๐Ÿ”„ Build process finished, reloading the config to catch changes if any
[dbx][2023-07-18 15:06:29.791] No build logic defined in the deployment file. Default pip-based build logic will be used.
[dbx][2023-07-18 15:06:29.793] All available workflows were selected for further operations: ['appfigures']
[dbx][2023-07-18 15:06:29.794] Locating package file
[dbx][2023-07-18 15:06:29.796] Package file located in: dist\xxx-0.0.1-py3-none-any.whl
[dbx][2023-07-18 15:06:30.619] Starting the traversal process
[dbx][2023-07-18 15:06:30.621] Processing libraries for task ingest
[dbx][2023-07-18 15:06:30.623] โœ… Processing libraries for task ingest - done
[dbx][2023-07-18 15:06:30.624] Processing libraries for task standardize-products
[dbx][2023-07-18 15:06:30.625] โœ… Processing libraries for task standardize-products - done
[dbx][2023-07-18 15:06:30.626] Processing libraries for task standardize-ratings
[dbx][2023-07-18 15:06:30.628] โœ… Processing libraries for task standardize-ratings - done
[dbx][2023-07-18 15:06:30.629] Processing libraries for task standardize-reviews
[dbx][2023-07-18 15:06:30.630] โœ… Processing libraries for task standardize-reviews - done
[dbx][2023-07-18 15:06:30.632] Processing libraries for task standardize-sales-downloads
[dbx][2023-07-18 15:06:30.635] โœ… Processing libraries for task standardize-sales-downloads - done
[dbx][2023-07-18 15:06:30.637] Traversal process finished, all provided references were resolved
[dbx][2023-07-18 15:06:30.638] ๐Ÿค– Applying workflow definitions via API
[dbx][2023-07-18 15:06:30.973] ๐Ÿช„  Updating existing workflow with name appfigures and id:  999990853089277
[dbx][2023-07-18 15:06:31.353] โœ… Applying workflow definitions - done
DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
        EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
        ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
        SharedTokenCacheCredential: Azure Active Directory error '(invalid_grant) AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2023-01-03T10:17:27.4356604Z and was inactive for 
90.00:00:00.
Trace ID: xxx
Correlation ID: xxx
Timestamp: 2023-07-18 13:06:32Z'
Content: {"error":"invalid_grant","error_description":"AADSTS700082: The refresh token has expired due to inactivity. The token was issued on xxxx and was inactive for 90.00:00:00.\r\nTrace ID: xxx\r\nCorrelation ID: xxxx\r\nTimestamp: 2023-07-18 13:06:32Z","error_codes":[700082],"timestamp":"2023-07-18 13:06:32Z","trace_id":xxxx","correlation_id":"xxx","error_uri":"https://login.microsoftonline.com/error?code=700082"}
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.

hi @panoptikum ,

now I see the issue, but it's still unrelated to dbx (at least as it seems to me).

My guess would be that you have some old env variables that actually prevent the usage of the up-to-date token that you obtain from:

az login

Could you please verify that after running az login you can perform any operations on the ADLS container mentioned in the .dbx/project.json? e.g. run this command:

az storage blob directory list -c MyContainer -d DestinationDirectoryPath --account-name MyStorageAccount

Hi @renardeinside ,

Yes, I completely agree that this is issue is not related to dbx, but I could not figure it out by myself. You're hint was very good. Only after I have set the environment variable AZURE_STORAGE_AUTH_MODE to "login" the listing of the directory was successful. This however did not lead that dbx or the azure python sdk picked this way of authentication up. I was still getting this error:

DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
        ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
        SharedTokenCacheCredential: Azure Active Directory error '(invalid_grant) AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2023-01-03T10:17:27.4356604Z and was inactive for 90.00:00:00.
Trace ID: eb55cb82-91e0-4394-85ea-0dd5be7e0600
Timestamp: 2023-07-19 08:15:42Z'

But I was consulting the mentioned address (https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot) again and when I set the environment variables AZURE_USERNAME and AZURE_PASSWORD then it seems to work as expected. I will play a bit more around during the day, but I'm optimistic.

It however still raises an info/warning, either by dbx or azure (not visible to me):

Incomplete environment configuration. These variables are set: AZURE_PASSWORD, AZURE_USERNAME

Do you know why this environment configuration is considered incomplete? According to the troubleshoot guide of azure it should be sufficient to set these two variables.

Unfortunately I don't have much expertise on Azure auth CLI setup (I've always used standard az login and it worked for me in 100% of cases).

What I know for sure is that dbx uses standard Azure auth mechanism under the hood, exactly the same as inaz CLI.

Therefore any Azure auth related message is coming from this CLI/API and is just proxied via dbx.

Make sure that you can perform the following actions with pure az cli:

  • list the artifact directory
  • upload a file to it
  • download a file from it

If the issue on az level persists, please raise an issue in the relevant repo - https://github.com/Azure/azure-cli
If you can perform the actions above in az CLI and cannot do the same with dbx - please reopen this ticket.