Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.
- Create an Azure Databrick Workspace in the Azure portal
- Open the workspace and click on your name
- Then select User Settings
- Create a new token (save the value)
- Download Databricks CLI https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html
- Configure CLI
databricks configure --token
- Update the values in the external-metastore.sh
- This assumes you have a Hive metastore in SQL Server, a ADLS Gen 1 account and a Service Principle
- You can create an HDI Cluster and let that create your metastore
- Please check your Hive version and update the external-metastore.sh
- The external-metastore.sh tells Databricks not to check the schema (validation)
- Upload the external metastore (run the lines below, you can run over and over if you'd like)
dbfs mkdirs dbfs:/databricks/init/
dbfs rm dbfs:/databricks/init/external-metastore.sh
dbfs cp external-metastore.sh dbfs:/databricks/init/external-metastore.sh
This assume you have a file (CSV to view or just do a directory listing)
%scala
dbutils.fs.ls("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH")
val df = spark.read.text("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH/SAMPLE-FILE.csv")
df.show()
%sql
show tables;
https://docs.databricks.com/user-guide/advanced/external-hive-metastore.html https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html