/azure-airflow

Primary LanguageHCLMIT LicenseMIT

Airflow on Azure

This repo contains instructions, samples,and best practices for using Apache Airflow on Azure

Pre-requisite

Getting Started

  • Using Azure CLI:

    • Create service principal
      • Run az ad sp create-for-rbac --skip-assignment and note appId and password
  • In ./terraform:

    • Replace the default for aks_sp_client_id and aks_sp_client_secret with the generated service principal appId and password respectively
    • Run terraform plan -out=out.tfplan
    • Run terraform apply out.tfplan. This process may take up to 5 hours (Azure Redis takes a long time to provision)
  • In ./docker:

    • Run docker build --rm -t <azure-container-reg-login-server>/docker-airflow . to build airflow docker image
    • Run az acr login --name <azure-container-reg-name> to login to container registry
      • For more authentication methods, please see this
    • Run docker push <azure-container-reg-login-server>/docker-airflow to push airflow docker images to ACR
  • Through Portal:

    • Allow connection from your client to APG
      • Navigate to "azure-airflow-pgsrv" -> Select "Connection Security" -> Click "Add Client IP" -> Click "Save"
  • In ./:

    • Create airflow database user and grant it access to airflow database
      • Run psql -h <pg-server-name>.postgres.database.azure.com -U <pg-username>@<pg-server-name> -d airflow -c "create user airflow with encrypted password 'foo'; grant all privileges on database airflow to airflow;"
  • In ./helm:

    • Generate fernet key for Airflow (instructions here)
    • In airflow.yaml, place the generated fernet key as the value of fernetKey
    • In airflow.yaml, place postgresql+psycopg2://airflow@<pg-server-name>:foo@<pg-server-name>.postgres.database.azure.com:5432/airflow?sslmode=require as the value of sqlalchemy_connection
  • Configure kubectl to use AKS cluster context:

    • Run az aks get-credentials --name <airflow-aks-resource-name> --resource-group <airflow-resource-group-name>
      • After context has been merged locally, run kubectl config use-context <airflow-aks-resource-name> subsequently

References

Next Steps