Azure-Databricks-External-Hive-and-ADLS

Shows how to use an External Hive (SQL Server) along with ADLS Gen 1 as part of a Databricks initialization script that runs when the cluster is created.

Stpes

Create an Azure Databrick Workspace in the Azure portal
Open the workspace and click on your name
Then select User Settings
Create a new token (save the value)
Download Databricks CLI https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html
Configure CLI databricks configure --token
Update the values in the external-metastore.sh
- This assumes you have a Hive metastore in SQL Server, a ADLS Gen 1 account and a Service Principle
- You can create an HDI Cluster and let that create your metastore
- Please check your Hive version and update the external-metastore.sh
- The external-metastore.sh tells Databricks not to check the schema (validation)
Upload the external metastore (run the lines below, you can run over and over if you'd like)

dbfs mkdirs dbfs:/databricks/init/
dbfs rm dbfs:/databricks/init/external-metastore.sh
dbfs cp external-metastore.sh dbfs:/databricks/init/external-metastore.sh

To Test ADLS (run this in a notebook)

This assume you have a file (CSV to view or just do a directory listing)

%scala
dbutils.fs.ls("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH")
val df = spark.read.text("adl://YOUR-DATA-LAKE-HERE.azuredatalakestore.net/DIRECTORY-PATH/SAMPLE-FILE.csv")
df.show()

To Test External Hive (run this in a notebook)

%sql
show tables;

Reference

https://docs.databricks.com/user-guide/advanced/external-hive-metastore.html https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html

AdamPaternostro/Azure-Databricks-External-Hive-and-ADLS

Azure-Databricks-External-Hive-and-ADLS

Stpes

To Test ADLS (run this in a notebook)

To Test External Hive (run this in a notebook)

Reference