To run these exercises, follow each instructions on the notebook below.
- Storage Settings
- Basics of PySpark, Spark Dataframe, and Spark Machine Learning
- Spark Machine Learning Pipeline
- Hyper-parameter Tuning
- MLeap (requires ML runtime)
- Spark PyTorch Distributor (requires ML runtime)
- Structured Streaming (Basic)
- Structured Streaming with Azure Event Hubs or Kafka
- Delta Lake
- MLflow (requires ML runtime)
- Orchestration with Azure Data Services
- Delta Live Tables
- Databricks SQL
- Create Azure Databricks resource in Microsoft Azure.
When you create a resource, please select Premium plan. - After the resource is created, launch Databricks workspace UI by clicking "Launch Workspace".
- Create a compute (cluster) in Databricks UI. (Select "Compute" menu and proceed to create.)
Please select runtime in ML (not a standard runtime). - Clone this repository by running the following command. (Or download HandsOn.dbc.)
git clone https://github.com/tsmatz/azure-databricks-exercise
- Import
HandsOn.dbc
into your Databricks workspace as follows.- Select "Workspace" in Workspace UI.
- Go to user folder, click your e-mail (the arrow icon), and then select "import".
- Pick up
HandsOn.dbc
.
- Open the imported notebooks and attach above compute (cluster) in every notebooks. (Select compute (cluster) on the top of notebook.)
- Please make sure to run "Exercise 01 : Storage Settings (Prepare)", before running other notebooks.
Note : You cannot use Azure trial (free) subscription, because of the limited quota. When you're in Azure free subscription, please promote to pay-as-you-go. (The credit in free subscription will be reserved, even when you transit to pay-as-you-go.)
Tsuyoshi Matsuzaki @ Microsoft