/minikube-airflow

Delux Airflow deployment with Minikube

Primary LanguagePython

Production-like Airflow Development Environment on Minikube

This is a everything you need to deploy Airflow on minikube.

The default setup is using local logging with an NFS server.

Image names are: localhost:5000/base and localhost:5000/dag-img.

We are using local image registry.

The cli tool only helps with the AWS credentials setup and deployment related thing. For gcp auth there is a minikube add-on. Also the run minikube with local registry part should be done by you. After that open a new tab and run the following.

When you enable the registry addon you will get a message something like this:

Registry addon on with docker uses 55005 please use that instead of default 5000

In this case you should use the 55005 in the next command

docker run --rm -it --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):paste-port-number-here"

Please note that this command should run in a separate terminal window in an interactive way !! Otherwise you won't be able to push images to this local registry.

Airflow UI password can be found in helm/files/secrets/airflow/AFPW

Run Minikube with local registry

Start Minikube in the default way

minikube start --insecure-registry "10.0.0.0/24"

Enable registry add-on

minikube addons enable registry

From here you can copy files and set AWS creds manually or use the cli tool

CLI Usage

Go to the airflow-on-minikube folder in the terminal and type

python main.py --help

With the help of the cli tool you can set aws creds, deploy everything with the base setup, copy dag/project/requirements files and change image on the cluster after you modified any of your project/dag files with only one command.

Important to check the copied files especially the dag files since this deployment created a namespace called development and deploys everything there. Which mean you have to change the namespace in the KubernetesPodOperator. Also now the image is called: localhost:5000/dag-img.

Set up everyhting by hand

Go to the docker/base and build the base img after that go to docker/dag and build the dag image

Before you build the your dag image don't forget to add/change the first line to this: FROM localhost:5000/base

cd airflow-on-minikube/docker/base
docker build -t localhost:5000/base .
cd ../dag
docker build -t localhost:5000/dag-img .

After enabling the registry add-on you can see a port number that will be this var {coming from current setup}. Important note: this only works for mac. For windows please check out this page: https://minikube.sigs.k8s.io/docs/handbook/registry/

docker run --rm -it --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):{coming from current setup}"

Push the images to the local registry

docker push localhost:5000/base
docker push localhost:5000/dag-img

Create namespace for dev

kubectl create namespace development

Deploy the PostgreSQL service to create the metadat-databse for Airflow

kubectl apply -f postgres/ --namespace development

Logging

Choose according to your preference. By default the local logging option is used since this is a development environment. In case you want to use Cloud logging please check the local-logging folder readme file to delete the mentioned code parts from the deployment.yml file and also modify the airflow.cfg file inside the docker/base folder.

1.) PersistentVolume logging

This is what we use by default. For the setup and the how to check the local-logging folder readme file.

2.) Cloud storage logging

Create the MyLogConn variable on the Airflow UI, use the same name what is inside the airflow.cfg file.

FOR AWS LOGGING

conn id: MyLogConn conn type: S3 host: bucket name login: AWS_ACCESS_KEY_ID password: AWS_SECRET_ACCESS_KEY

FOR GCP

var name: MyLogConn conn type: Google Cloud Platform project id: gcp project name scopes: https://www.googleapis.com/auth/cloud-platform Keyfile JSON: the service_account.json (upload the actual json file)

3.) Install Lens the Kubernetes IDE

With Lens you can connect to any of your Kubernetes clusters easily and you just have to click on the pod (Airflow task in this case) and click on the first icon from the left in the top right corner. After that in the built-in terminal you'll see the logs.

The easy way to give cloud access to your scripts

For GCP

There is an available add-on to help you setting up GCP auth. Here is the link: https://minikube.sigs.k8s.io/docs/handbook/addons/gcp-auth/

For AWS

Add these to your dag files

secret_id = os.getenv("AWS_ACCESS_KEY_ID", None)
secret_key = os.getenv("AWS_SECRET_ACCESS_KEY", None)

env_vars={
        'AWS_ACCESS_KEY_ID': variable_id,
        'AWS_SECRET_ACCESS_KEY': variable_secret_key}

And put your credentials into the helm/files/secrets/airflow appropriate files. (AWS_ACCESS_KEY_ID, AWS_SECRET_KEY)

Deploy airflow

Based on the default settings deploy nfs server provisioner

helm install nfs-provisioner stable/nfs-server-provisioner --set persistence.enabled=true,persistence.size=5Gi --namespace development

Deploy Persistent volume Claim

kubectl apply -f local-logging/persistent-volume-claim.yml  --namespace development

cd into minikube-airflow/helm and run the following cmd

helm install airflow-minikube-dev . --namespace development

Run the following cmd and copy-paste the url to your browser

minikube service --url airflow --namespace development

Airflow UI password can be found in helm/files/secrets/airflow/AFPW

Deploy new image to our cluster

Helm upgrade

First you have to give some unique tag to your new dag image

Example:

docker build -t localhost:5000/dag-img:modified .
docker tag localhost:5000/dag-img:modified localhost:5000/dag-img:latest
docker push localhost:5000/dag-img:modified
docker push localhost:5000/dag-img:latest

After this you have to change the image in you cluster

helm upgrade airflow-minikube-dev helm/ --install --wait --atomic --set dags_image.tag=modified

Clean up

helm uninstall airflow-minikube-dev --namespace development
helm uninstall nfs-provisioner --namespace development
kubectl delete namespace development

In case of clean_up command doesn't work

It is possible the while deleting a namespace it's stuck at a terminating phase. When it happens just use run

kubectl get all --namespace development

adn use the pod/podname of the problematic pod and paste it

kubectl delete pod/podname --grace-period=0 --force --namespace development