Azure Provision Analytics Workspaces

This project demonstrates how to provision analytics workspaces on Azure using several technologies:

  • Python Azure API
  • Docker Images
  • Azure Container Instances
  • Azure Container Registry
  • Azure Functions

screenshot-pipeline

Workflow:

  • Begin with an http request to a function app
  • The function app starts a container instance for a specific docker image
  • The Docker image has the python code to create new resources in Azure such as
    • Storage Blobs
    • Active Directory Users and Groups
    • Databricks Cluster
    • Permission for access

Setup

This setup will deploy the core infrastructure needed to run the the solution. There are two phases:

  • Phase 1: Core infrastructure
    • Resource Group
    • Container Registry
    • Service Principal - (Permission to Read from Docker Registry)
    • Function App
  • Phase 2: Container
    • Docker Image
    • Container Instance

Phase 1

Resource Group

Create a resource group for this project

az group create --name provisionAnalyticsWorkspaces --location eastus

Container Repository

Create a Private Docker Container Reposity in Azure

az acr create --resource-group provisionAnalyticsWorkspaces --name pawContainerRegistry --sku Basic

Take note of loginServer in the output, which is the fully qualified registry name (all lowercase). Throughout the rest of this document <registry-name> is a placeholder for the container registry name, and <login-server> is a placeholder for the registry's login server name.

Service Principal

Create a Service Principal on Azure (Pull Images).

The solution uses a service principal to pull images from the Private Docker Repository created

Create the service principal and save the secrets

az ad sp create-for-rbac --name sp_paw_test_container_repo --skip-assignment --sdk-auth > local-sp.json

Notice the username and password are saved to the file local-sp.json

Role assignment

Next we have to assign the Azure Container Registry Pull role-assignment to the new service principal

$SERVICE_PRINCIPAL_ID = "<service_principal_clientId>"
$ACR_REGISTRY_NAME = "<registry_name>"
$ACR_REGISTRY_ID = az acr show --name $ACR_REGISTRY_NAME  --query id --output tsv

# Create the role assignment
az role assignment create --assignee $SERVICE_PRINCIPAL_ID --scope $ACR_REGISTRY_ID --role acrpull

# Show the role assignment
az role assignment list --assignee $SERVICE_PRINCIPAL_ID

Function App

This Azure Functions is the trigger to start the container. The function app is created using the Consumption plan, which is ideal for event-driven serverless workloads. The function uses a managed identity to start the container instance. The managed identity will use a custom role to start the container.

# Create the Custom role
az role definition create --role-definition docs/custom-role.json

# The function app needs a storage account.
az storage account create --name pawstorage4112 --location eastus --resource-group provisionAnalyticsWorkspaces  --sku Standard_LRS
az functionapp create --name pawfunctionApp --storage-account pawstorage4112 --consumption-plan-location eastus --resource-group provisionAnalyticsWorkspaces --os-type linux --runtime python --runtime-version 3.7 --functions-version 2
az functionapp identity assign --name pawfunctionApp --resource-group provisionAnalyticsWorkspaces --role Container Instance Operator --scope  /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/provisionAnalyticsWorkspaces

Phase 2

See the Development section for steps to

  • Build and deploy the docker image
  • Deploy a container instance

Development

Setup your dev environment by creating a virtual environment

# virtualenv \path\to\.venv -p path\to\specific_version_python.exe
python -m venv .venv
.venv\scripts\activate

deactivate

Style Guidelines

This project enforces quite strict PEP8 and PEP257 (Docstring Conventions) compliance on all code submitted.

We use Black for uncompromised code formatting.

Summary of the most relevant points:

  • Comments should be full sentences and end with a period.
  • Imports should be ordered.
  • Constants and the content of lists and dictionaries should be in alphabetical order.
  • It is advisable to adjust IDE or editor settings to match those requirements.

Ordering of imports

Instead of ordering the imports manually, use isort.

pip3 install isort
isort -rc .

Use new style string formatting

Prefer f-strings over % or str.format.

#New
f"{some_value} {some_other_value}"
# Old, wrong
"{} {}".format("New", "style")
"%s %s" % ("Old", "style")

One exception is for logging which uses the percentage formatting. This is to avoid formatting the log message when it is suppressed.

_LOGGER.info("Can't connect to the webservice %s at %s", string1, string2)

Testing

You'll need to install the test dependencies into your Python environment:

pip3 install -r requirements_dev.txt

Now that you have all test dependencies installed, you can run tests on the project:

isort .
codespell  --skip="./.*,*.csv,*.json,*.pyc,./docs/_build/*,./htmlcov/*"
black script
flake8 script
pylint script
pydocstyle script

Build Docker Images

Build and run your image.

Run Docker Image locally

> docker build --pull --rm -f "dockerfile" -t provisionanalyticsworkspaces:latest "."
> docker run --rm -it provisionanalyticsworkspaces:latest

# Run interactive with environment variables
> docker run --rm -it --env-file local.env provisionanalyticsworkspaces:latest

#If you want to see STDOUT use
> docker run --rm -a STDOUT provisionanalyticsworkspaces:latest

Tag for remote registry

docker tag provisionanalyticsworkspaces:latest $ACR_REGISTRY_NAME.azurecr.io/provisionanalyticsworkspaces:v1

az acr login --name $ACR_REGISTRY_NAME
docker push $ACR_REGISTRY_NAME.azurecr.io/provisionanalyticsworkspaces:v4

Deploy Container Instance

Run the new image on Azure Container Instance

Copy the file deploy-aci-example.yaml as deploy-aci.yaml

Edit the file deploy-aci.yaml and update with the correct values:

  • image: the full name of the image
  • username: the service principal clientId
  • password: the service principal clientSecret
az container create --resource-group provisionAnalyticsWorkspaces --file deploy-aci.yaml

Deploy Function App

Publish the function app from command line or with the VSCode extension.

Copy the local.settings.example.json to local.settings.json and replace the palceholder with the correct values:

cd /path/to/project/functions
func azure functionapp publish pawfunctionApp

Common Issues

  • No module found.
    • Be sure to run in a virtual environment
  • No module named azure.cli

References