/MLonBigData

Working on workshop with repo renamed

Primary LanguageJupyter Notebook

Welcome to a hands-on workshop for Machine Learning on Big Data using Azure Databricks, Azure Data Factory, and Azure Machine Learning service

Note: If you would like to view my presentation of this content in Beijing DevDays in November 2018 you can view it HERE

The datasets and base notebooks were provided with data from the SQL Server 2017 Adventureworks Data Warehouse AdventureWorksDW2017.bak and the Azure Machine Learning Notebooks

End-to-end Custom AI Solution

Prerequisites

To deploy the Azure resources required for this lab, you will need:

  1. An Azure account

    Note: If you don't have an account you can create your free Azure account here

  2. Microsoft Azure Storage Explorer

  3. Clone this GitHub repository using Git and the following commands:

    git clone https://github.com/DataSnowman/MLonBigData.git

Note that you will be deploying a number of Azure resources into your Azure Subscription when either clicking on the Deploy to Azure button below, or by alternatively deploying by using an ARM template and parameters file via the Azure CLI.

Deploy Bike Buyer Template to Azure

Important `For the DevDays in Taipei please use the Southeast Asia Region to deploy this solution

Note: If you encounter issues with resources please check by running the following commands in the Azure CLI (Note more information on using the CLI is found in the Provisioning using the Azure CLI section below):

az login

az account show

az account list-locations

az provider show --namespace Microsoft.Databricks --query "resourceTypes[?resourceType=='workspaces'].locations | [0]" --out table

Choices for Provisioning

You can provision using the Deploy to Azure button above or by using the Azure CLI.

Provisioning using the Azure Portal

Choose your Subscription, and enter a Resource group, Location (Southeast Asia for the DevDays in Taipei) Resource Prefix (Short Prefix of 10 characters or less for all resources created by this template so they are unique), SQL Server Username, SQL Server Password, and agree to the Terms and Conditions. Then click the Purchase button.

setup

When the Deployment completes you will receive a notification in the Azure Portal. Click the Go to resource group button.

preview

After you open the resource group in the Azure portal you should see these deployed resources

deploy

Provisioning using the Azure CLI

  1. Download and install the Azure CLI Installer (MSI) for Windows or Mac or Linux . Once the installation is complete open the command prompt and run az login, then copy the access code returned. In a browser, open a private tab and enter the URL aka.ms/devicelogin. When prompted, paste in the access code from above. You will be prompted to authenticate using our Azure account. Go through the appropriate multifaction authenication.

  2. Navigate to the folder MLonBigData\setup If using Windows Explorer you can launch the command prompt by going to the address bar and typing cmd (for the Windows command prompt) or bash (for the Linux command prompt assuming it is installed already) and type az --version to check the installation. Look for the parameters-mlbigdata.json file you cloned during the Prerequisites above.

  3. When you logged in to the CLI in step 1 above you will see a json list of all the Azure account you have access to. Run az account show to see you current active account. Run az account list -o table if you want to see all of you Azure account in a table. If you would like to switch to another Azure account run az account set --subscription <your SubscriptionId> to set the active subcription. Run az group create -n mlbigdata -l southeastasia to create a resource group called mlbigdata.

  4. Next run the following command to provision the Azure resources:

az group deployment create -g mlbigdata --template-file azureclideploy.json --parameters @parameters-mlbigdata.json

Once the provisioning is finished, we can run az resource list -g mlbigdata -o table to check what resources were launched. Our listed resources includes: * 1 Storage account * 1 Data Factory * 1 Databricks workspace * 1 SQL Server * 1 SQL database.

Data Scientist using Anaconda and Jupyter Notebooks

If you are interested in this scenario start here

Original Data Scientist Work

Data Engineer using Azure Databricks Notebooks

If you are interested in this scenario start here

Data Engineering with Azure Databricks

Data Engineer using Azure Data Factory Data Flow and Azure Databricks

If you are interested in this scenario start here

deWithAzureDataFactoryDF

Data Scientist using Azure Databricks and Databricks Notebooks and Azure Machine Learning service SDK

If you are interested in this scenario start here

Data Science with Azure Databricks and AML SDK

Data Scientist using Azure Machine Learning service Visual Interface

If you are interested in this scenario start here

Data Science with AML Visual interface Portal

Data Science with AML Visual interface

Data Scientist using Azure Machine Learning service Notebook VMs

If you are interested in this scenario start here

Data Science with AML Notebook VMs 1

Data Science with AML Notebook VMs 2

Hope you enjoyed this workshop.