This code repository is showing how to make ODBC connection from Azure Databricks to Azure SQL Database with Azure AD user access token. Databricks doesn’t supported for Azure AD user passthrough authentication to Azure SQL Database and only supported for Azure Data Lake Storage (ADLS). Due to this limitation, we’re only able to use Azure AD Service Principle (or primitive SQL ID) to make authentication with Azure SQL Database. And we’re also required to setup (Databricks Secret Scope maybe also using Azure Key Vault-backed scope in advanced scenario) with corresponding access control to save the Azure AD Service Principle secrets. This make lots of administration and operation overhead on Databricks Secret Scope. And Azure AD Service Principle based authentication is also hard to trace/audit database access as Secret Scope can be shared.
To make our life easier, I’m trying to make interactive authentication with Azure AD from Databricks notebook and obtain the access token. And then use pyodbc to make token-based authentication with Azure SQL Database.
- Analyst use browser to interact Azure Databricks with Notebook.
- Initiate interactive authentication (with device code) from Notebook. Open up additional tab from browser (https://microsoft.com/devicelogin) to perform interactive authentication with Azure AD.
- After login successful, Notebook user session will get the user access token from Azure AD.
- Leverage collected access token to perform token-based authentication with Azure SQL Database via pyodbc library and MS-SQL driver, load SQL query into Panda dataframe, then covert Panda dataframe into Spark dataframe.
All data in this code repository is coming from classic dataset of diabetes classification.
Enjoy!