Secure Databricks cluster with Data exfiltration Protection and Privatelink for Storage, KeyVault and EventHub using Bicep.
Architecture and Key Features • To Do • How To Use • Credits • Support • Reference • License
Bicep is free and supported by Microsoft support and is fun, easy, and productive way to build and deploy complex infrastructure on Azure. If you are currently using ARM you will love Bicep simple syntax. Bicep also support declaring existing resources. More resources available at this Link
- Based on best practices from Azure Databricks Best Practices and template from Anti-Data-Exfiltration Reference architecture
- Hub and Spoke VNETs.Link
- Databricks cluster created in spoke VNET. Link
- Firewall with UDR to allow only required Databricks endpoints. Link
- Storage account with Private endpoint. Link
- Azure Key Vault with Private endpoint. Link
- Create Databricks backed secret scope.
- Azure Event Hub with Private endpoint. Link
- Create cluster with cluster logging and init script for monitoring.Link
- Sample Databricks notebooks into workspace.
- Secured Windows Virtual machine with RDP (Protect data from export).[Link]
- Configure Log analytics workspace and collect metrics from spark worker node
- Configure Diagnostic logging.Link
- Configure sending logs to Azure Monitor using mspnp/spark-monitoring
- Configure overwatch for fine grained monitoring. Link
- Create Azure ML workspace for Model registry and assist in deploying model to AKS
- Create AKS compute for AML for real time model inference/scoring
- Create Databricks secret scope backed by Azure Key Vault. Link
- Create Azure SQL with Private link. Link
- Create an integrated ADF pipeline
- Integrate into Azure DevOps
- Create Databricks performance dashboards
- Create and configure External metastore
- Configure Databricks access to specific IP only
- More sample Databricks notebooks
- Add description to all parameters
- Managed Identity needs to be enabled as a resource provider inside Azure
- For the bash script,
jq
must be installed.
- Client PC password complexity requirements:
The supplied password must be between 8-123 characters long and must satisfy at least 3 of password complexity requirements from the following:
- Contains an uppercase character
- Contains a lowercase character
- Contains a numeric digit
- Contains a special character
- Control characters are not allowed
To clone and run this repo, you'll need Git, Bicep and azure-cli installed on your computer. Strongly recommend to use vs code to edit the file with bicep extension installed (instructions) for intellisense and other completions. From your command line:
Click on the above link to deploy the template.
If you need to customize the template you can use the following command:
# Clone this repository
$ git clone https://github.com/lordlinus/databricks-all-in-one-bicep-template.git
# Go into the repository
$ cd databricks-all-in-one-bicep-template
# Update main.bicep file with variables as required. Default is for southeastasia region.
# Refer to Azure Databricks UDR section under References for region specific parameters.
$ code main.bicep
# Run the build shell script to create the resources
$ ./build.sh
Note: Build script assume Linux environment, If you're using Windows, see this guide on running Linux
This template is based on ARM templates from the below repo:
This repo code is provided as-is and if you need help/support on bicep reach out to Azure support team (Bicep is supported by Microsoft support and 100% free to use.)
MIT
GitHub @lordlinus · Twitter @lordlinus · Linkedin Sunil Sattiraju