/airflow-ml-prediction

Running ECS task for ML prediction orchestrated by Airflow

Primary LanguagePythonApache License 2.0Apache-2.0

Airflow ETL

Running an ECS task for ML prediction orchestrated by Airflow

Building Airflow on Docker

docker pull puckel/docker-airflow

Building the image (installing boto3 for AWS configurations):

docker build -t ml-pipeline .

We will create a volume that maps the directory on our local machine where we’ll hold DAG definitions, and the locations where Airflow reads them on the container with the following command:

docker run -d -p 8080:8080 -v /Users/danieldacosta/Documents/GitHub/airflow-etl/dags:/usr/local/airflow/dags ml-pipeline

S3

On this example we are using two buckets: one for storing the model (.sav) and inputs (.csv), and another one for storing the model output.

  • READ_BUCKET=ml-sls-deploy-prd
  • READ_DATA_PATH=data
  • READ_MODELS_PATH=models
  • WRITE_BUCKET=ml-sls-deploy-prd-results
  • WRITE_DATA_PATH=results

Deploy your ECS cluster

You will need to create the following objects:

  • Create a Cluster: Choose Network only. This configuration is built using Fargate Tasks: the Fargate launch type allows you to run your containerized applications without the need to provision and manage the backend infrastructure. When you run a task with a Fargate-compatible task definition, Fargate launches the containers for you.

  • Task Definition: The creation of your container blueprint. You'll need to create a Task Role: IAM Role that tasks can use to make API requests to authorized AWS services; Since our container is reading and writing to/from s3, it will need these permissions. You will also need to create a Task Execution Role: an IAM that helps pulling images from your docker register, we are using ECR here.

  • Add a Container: You'll need to deploy your container to ECS Fargate. You can use the Docker image on folder 'ml-pipeline' as an example.

I recommend that you follow this tutorial: https://towardsdatascience.com/step-by-step-guide-of-aws-elastic-container-service-with-images-c258078130ce.

Setting environment variables on Airflow

You will need to set up your AWS credentials and ECS variables on the Airflow Console Airflow_varibales

Run DAG

Once everything set up you can Trigger your DAG manually and check if everthing went well.

References