/ecs-airflow

Cloudformation templates for deploying Airflow in ECS

Primary LanguageShellMIT LicenseMIT

Airflow on ECS

Infrastrucuture scripts to orchestrate an Apache Airflow cluster in ECS

These scripts are ported from aws-infrastructure

Prerequisites

AWS CLI

The scripts rely on AWS CLI. You can configure a file:

~/.aws/credentials:

[default]
aws_access_key_id=changeme
aws_secret_access_key=changeme
region=us-west-2

If you prefer, AWS Security Token Service (STS) as well.

Encrypting sensitive passwords

The bash script kms-encrypt can be used to create a new KMS data key and store the cipher text in a JSON configuration file if it doesn't already exist. The KMS plaintext key is then used to configure a sensitive configuration value for a given parameter.

kms_key_id=changeme
./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=ParameterKey --secret='changeme' \
    --config-path=path-to-config.json

Note: This script uses a docker container from the python:3.6-slim image which has OpenSSL 1.1.0. This version is the same as the one used by puckel/docker-airflow:1.9.0-4 which is used for our ECS docker image. If there is a mismatch between OpenSSL versions the containers will not be able to decrypt passwords and keys.

Deploying Cloudformation Stack

Build the docker image

The CloudFormation templates use the ECR registry to pull Docker Images. However, it might be useful to push images to Docker Hub for developers that may not have access to AWS. The build-docker-image script support both ECR and Docker Hub.

ECR

Run the following command:

eval $(aws ecr get-login --no-include-email --region us-west-2)

It will return a command to login to ECR. Run that command to login.

To build the image make sure to change the Account ID:

aws_account_id=changeme
./build-docker-image --repo ecr --aws-account-id ${aws_account_id} --region us-west-2 --version 0.0.4

Docker Hub

Login to Docker Hub with your credentials

docker login

To build the image:

./build-docker-image --repo dockerhub --version 0.0.4

Postgres (RDS)

Create a JSON file dev-airflow-postgres.json to override non-default parameters. Note that SubnetIds are in the private subnet and we have included instructions below on how to connect to it over an SSH tunnel.

{
    "Parameters": {
      "VpcId": "vpc-xxxxxxxx",
      "SubnetIds": "subnet-aaaaaaaa,subnet-bbbbbbbb",
      "DatabaseName": "airflow",
      "StorageInGb": 100,
      "StorageIops": 1000,
      "PostgresVersion": "10.3",
      "DbInstanceType": "db.t2.small",
      "AllowedCIDR": "10.0.0.0/16",
      "BackupRetentionInDays": "7",
      "MultiAZDeployment": "true",
      "PostgresMasterUsername": "airflow_user",
      "KmsKeyId": "arn:aws:kms:us-west-2:************:key/********-****-****-****-************",
      "Organization": "Freckle IoT",
      "Team": "Freckle",
      "Environment": "dev",
      "Component": "Airflow"
    }
}

Deploy the Cloudformation template:

SENSITIVE_PARAMS='"PostgresMasterPassword=changeme"' ./deploy-stack \
    cloudformation/postgres-rds.cloudformation.yaml \
    dev-airflow-postgres ../ecs-airflow-config/dev-airflow-postgres.json

Note: We send in the password in this manner because kms-encrypt won't help in this case and also helps to keep these sensitive passwords outside of source control.

SSH Tunnel to RDS:

ssh -i path-to-pem -N -L 5432:postgres-end-point:5432 ec2-user@bastion-host

Run the postgresql client (you might need to install the client first for the target system):

psql -h 127.0.0.1 -d airflow -U airflow_user

Setup the schema:

CREATE SCHEMA IF NOT EXISTS airflow AUTHORIZATION airflow_user;
ALTER ROLE airflow_user SET search_path TO airflow;
GRANT USAGE ON SCHEMA airflow TO airflow_user;
GRANT CREATE ON SCHEMA airflow TO airflow_user;

Redis (ElastiCache)

Create a JSON file dev-airflow-redis.json to override non-default parameters. Note that the SubnetIds are the same as SubnetIds for the RDS cluster and are private subnets.

{
    "Parameters": {
      "VpcId": "vpc-xxxxxxxx",
      "SubnetIds": "subnet-aaaaaaaa,subnet-bbbbbbbb",
      "RedisCacheNodeType": "cache.t2.small",
      "RedisVersion": "4.0.10",
      "AllowedCIDR": "10.0.0.0/16",
      "Organization": "My Org",
      "Team": "Airflow Team",
      "Environment": "dev",
      "Component": "Airflow"
     }
}

Deploy the Cloudformation template:

./deploy-stack cloudformation/redis-cluster.cloudformation.yaml \
    dev-airflow-redis ../ecs-airflow-config/dev-airflow-redis.json

ECS Cluster

Create a JSON configuration file dev-airflow-ecs.json. Note that the InstanceSubnetIds are the same as SubnetIds for the RDS cluster and are private subnets. The LoadBalancerSubnetIds are public subnets.

{
    "Parameters": {
      "VpcId": "vpc-xxxxxxxx",
      "InstanceSubnetIds": "subnet-aaaaaaaa,subnet-bbbbbbbb",
      "LoadBalancerSubnetIds": "subnet-cccccccc,subnet-dddddddd",
      "EcsInstanceType": "m5.large",
      "UseSSL": "yes",
      "BastionStack": "changeme",
      "CertificateArn": "arn:aws:acm:us-west-2:************:certificate/********-****-****-****-************",
      "LoadBalancerType": "internet-facing",
      "AllowedCidrIp1": "changeme",
      "AllowedCidrIp2": "changeme",
      "CloudWatchLogGroup": "dev-airflow",
      "CloudWatchLogRetentionInDays": 180,
      "KeyName": "changeme",
      "Organization": "My Org",
      "Team": "Airflow Team",
      "Environment": "dev",
      "Component": "Airflow"
    }
}

Deploy the Cloudformation template:

./deploy-stack cloudformation/ecs-cluster.cloudformation.yaml \
    dev-airflow-ecs ../ecs-airflow-config/dev-airflow-ecs.json

NOTES:

  • The Cloudformation stack will also create an S3 bucket with the same name as the stack.
  • The bucket will have versioning enabled although, s3fs does not itself support object versions it will always show the latest version of the S3 objects.
  • The bucket also has the DeletionPolicy set to Retain so if the stack is terminated, the bucket will be left behind.

Airflow Components

The cloudformation/airflow-ecs-services folder contains a nested stack that deploys the following ECS services:

  • Airflow Webserver
  • Celery Flower Monitoring Tool
  • Scheduler
  • Multiple Workers

Create a JSON file dev-airflow-ecs-services.json to override non-default parameters.

{
  "Parameters": {
    "EcsStackName": "dev-airflow-ecs",
    "PostgresDbStackName": "dev-airflow-postgres",
    "RedisStackName": "dev-airflow-redis",
    "PostgresUsername": "airflow_user",
    "RedisDb": "0",
    "CloudWatchLogGroup": "dev-airflow",
    "HostedZoneId": "chamgeme",
    "HostedZoneName": "example.com.",
    "DNSPrefix": "dev-airflow",
    "AirflowUserName": "admin",
    "AirflowEmail": "admin@example.com",
    "GoogleOAuthClientId": "changeme",
    "GoogleOAuthDomain": "changeme",
    "AirflowDockerImage": "************.dkr.ecr.us-west-2.amazonaws.com/airflow:0.0.2",
    "MinWebserverTasks": 1,
    "MaxWebserverTasks": 3,
    "DesiredWebserverTasks": 1,
    "MinFlowerTasks": 1,
    "MaxFlowerTasks": 3,
    "DesiredFlowerTasks": 1,
    "MinWorkerTasks": 1,
    "MaxWorkerTasks": 4,
    "DesiredWorkerTasks": 1,
    "SMTPUser": "changeme",
    "SMTPPassword": "changeme",
    "SMTPHost": "changeme",
    "SMTPPort": "change",
    "SMTPStartTLS": "changeme",
    "SMTPSSL": "changeme",
    "Organization": "My Org",
    "Team": "Airflow Team",
    "Environment": "dev",
    "Component": "Airflow"
  }
}

Configure the passwords and keys as follows:

kms_key_id=changeme
./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=PostgresPasswordEnc --secret='changeme' \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

fernet_key=$(docker run puckel/docker-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")
./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=FernetKeyEnc --secret="${fernet_key}" \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=GoogleOAuthClientSecretEnc --secret='changeme' \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=SMTPPasswordEnc --secret='changeme' \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

Deploy the Cloudformation template:

./deploy-nested-stack airflow-ecs-services \
    dev-airflow-ecs-services ../ecs-airflow-config/dev-airflow-ecs-services.json

NOTE: All changes should be done via the master stack dev-airflow-ecs-services. Do not update or destroy individual stacks within the nested stack as that will make it difficult to manage and deploy changes to the master stack running the ECS services.

Logging

The current ECS deployment for Airflow is not capable of obtaining the logs from individual worker tasks because they are mapped to random ports on the host machine whereas the configuration only supports a specific port 8793. Also, each worker is using its internal short hostname which is the Docker container ID which is not addressable between ECS Services.

You will see the following message when trying to view the logs from an Airflow job:

*** Log file isn't local.
*** Fetching here: http://673ee7a2fba0:8793/log/airflow-test/airflow-test-run/2018-07-25T01:00:51.105165/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='673ee7a2fba0', port=8793): Max retries exceeded with url: /log/airflow-test/airflow-test-run/2018-07-25T01:00:51.105165/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb737ed0ef0>: Failed to establish a new connection: [Errno -2] Name or service not known',))```

These logs can be seen in CloudWatch Logs by searching the Log Streams beginning with ecs-service/workers/*.