/ansible-docker-ecs

Ansible playbooks to provision an ECS cluster on AWS, running a webapp on Docker containers in the cluster and load balanced from an ALB, with the Docker image pulled from ECR.

Primary LanguageDockerfile

Ansible ECS provisioning and deployment with Docker containers

This is a suite of Ansible playbooks to provision an ECS (Elastic Container Service) cluster on AWS, running a webapp deployed on Docker containers in the cluster and load balanced from an ALB, with the Docker image for the app pulled from an ECR (Elastic Container Registry) repository.

Firstly a Docker image is built locally and pushed to a private ECR repository, then the EC2 SSH key and Security Groups are created. Next, a Target Group and corresponding ALB (Application Load Balancer type of ELB) are provisioned, then an ECS container instance is launched on EC2 for the ECS cluster. Finally the ECS cluster is provisioned, an ECS task definition is created to pull and launch the containers from the Docker image in ECR, and finally an ECS Service is provisioned to run the webapp task on the cluster as per the Service definition.

This is an Ansible framework to serve as a basis for building Docker images for your webapp and deploying them as containers on Amazon ECS. It can be expanded in multiple ways, the most obvious being to increase the number of running containers and ECS instances, either with manual scaling or ideally by adding auto-scaling.

CentOS 7 is used for the Docker container, but this can be changed to a different Linux distro if desired. Amazon Linux 2 is used for the ECS cluster instances on EC2.

I created a very basic Python webapp to use as an example for the deployment here, but you can replace that with your own webapp should you so wish.

N.B. Until you've tested this and honed it to your needs, run it in a completely separate environment for safety reasons, otherwise there is potential here for accidental destruction of parts of existing environments. Create a separate VPC specifically for this, or even use an entirely separate AWS account.

Accompanying blog article.

Installation/setup

  1. You'll need an AWS account with a VPC set up, and with a DNS domain set up in Route 53.
  2. Install and configure the latest version of the AWS CLI. The settings in the AWS CLI configuration files are needed by the Ansible modules in these playbooks. Also, the Ansible AWS modules aren't perfect, so there are a few tasks which needs to run the AWS CLI as a local external command. If you're using a Mac, I'd recommend using Homebrew as the simplest way of installing and managing the AWS CLI.
  3. If you don't already have it, you'll need Python 3. You'll also need the boto and boto3 Python modules (for Ansible modules and dynamic inventory) which can be installed via pip.
  4. Ansible needs to be installed and configured. Again, if you're on a Mac, using Homebrew for this is probably best.
  5. Docker needs to be installed and running. For this it's probably best to refer to the instructions on the Docker website.
  6. Copy etc/variables_template.yml to etc/variables.yml and update the static variables at the top for your own environment setup.
  7. ECR Docker Credential Helper needs to be installed so that the local Docker daemon can authenticate with Elastic Container Registry in order to push images to a repository there. Follow the link for installation instructions (on a Mac, as usual, I'd recommend the Homebrew method).

Configuring ECR Docker Credential Helper

The method which worked best for me was to add a suitable "credHelpers" section to my ~/.docker/config.json file:

"credHelpers": {
    "000000000000.dkr.ecr.eu-west-2.amazonaws.com": "ecr-login"
}

(I've replaced my AWS account ID with zeros, but otherwise this is correct.)

So, for me, the whole ~/.docker/config.json ended up looking like this. Yours may not be quite the same but hopefully it clarifies how to add the "credHelpers" section near the end:

{
    "auths": {
        "000000000000.dkr.ecr.eu-west-2.amazonaws.com": {},
        "https://index.docker.io/v1/": {
            "auth": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
        }
    },
    "credHelpers": {
        "000000000000.dkr.ecr.eu-west-2.amazonaws.com": "ecr-login"
    }
}

Hopefully now if your AWS credentials are also set correctly, you should have no trouble pushing Docker images to ECR repositories.

Usage

These playbooks are run in the standard way, i.e:

ansible-playbook PLAYBOOK_NAME.yml

To deploy your own webapp instead of my basic Python app, you'll need to modify build_push.yml so it pulls your own app from your repo, then you can edit the variables as needed in etc/variables.yml.

Playbooks for build/provisioning/deployment

  1. build_push.yml - pulls the webapp from GitHub, builds a Docker image using docker/Dockerfile which runs the webapp, and pushes the image to a private ECR repository.
  2. provision_key_sg.yml - provisions an EC2 SSH key, and Security Groups for ECS container instances and ELB.
  3. provision_production.yml - provisions Target Group and associated ALB (Application Load Balancer type of ELB) for load balancing the containers, provisions IAM setup for ECS instances, launches ECS container instance on EC2, provisions ECS cluster, and sets up ECS task definition and Service so the webapp containers deploy on the cluster using the Docker image in ECR.
  4. provision_dns.yml - provisions the DNS in Route 53 for the ALB; note that it may take a few minutes for the DNS to propagate before it becomes usable.

There are comments dotted about in the playbooks to help further explain certain aspects of what is going on.

Initially, running later playbooks without having run the earlier ones will fail due to missing components and variables etc. Running all four playbooks in succession will set up the entire infrastructure from start to finish.

Once everything is built successfully, the ECS service will attempt to run a task to deploy the webapp containers in the cluster. Below are instructions for how to check the service event log to see task deployment progress.

Redeployment

Once the environment is up and running, any changes to the app can be rebuilt and redeployed by running Steps 1 and 3 again. This makes use of the rolling deployment mechanism within ECS for a smooth automated transition to the new version of the app.

Playbooks for deprovisioning

  1. destroy_all.yml - destroys the entire AWS infrastructure.
  2. delete_all.yml - clears all dynamic variables in the etc/variables.yml file, deletes the EC2 SSH key, removes the local Docker image, and deletes the local webapp repo in the docker directory.

USE destroy_all.yml WITH EXTREME CAUTION! If you're not operating in a completely separate environment, or if your shell is configured for the wrong AWS account, you could potentially cause serious damage with this. Always check before running that you are working in the correct isolated environment and that you are absolutely 100 percent sure you want to do this. Don't say I didn't warn you!

Once everything has been fully destroyed, it's safe to run the delete_all.yml playbook to clear out the variables file. Do not run this until you are sure everything has been fully destroyed, because the SSH key file can never be recovered again after it has been deleted.

Checking the Docker image in a local container

After building the Docker image in Step 1, if you want to run a local container from the image for initial testing purposes, you can use standard Docker commands for this:

docker run -d --name simple-webapp -p 8080:8080 $(grep ecr_repo etc/variables.yml | cut -d" " -f2):latest

You should then be able to make a request to the local container at:

http://localhost:8080/

To check the logs:

docker logs simple-webapp

To stop the container:

docker stop simple-webapp

To remove it:

docker rm simple-webapp

Checking deployment status, logs, etc.

To check the state of the deployment and see events in the service log:

aws ecs describe-services --cluster simple-webapp --services simple-webapp --output text

This should show what's happening on the cluster in terms of task deployment, and hopefully you'll eventually see that the process successfully starts, registers on the load balancer, and completes deployment, at which point it should reach a "steady state":

EVENTS  2022-02-23T13:04:39.900000+00:00        3a087c70-aaa3-47d5-ae31-040db688155a    (service simple-webapp) has reached a steady state.
EVENTS  2022-02-23T13:04:39.899000+00:00        c0785dae-154d-440b-b315-f948901d48fb    (service simple-webapp) (deployment ecs-svc/4617274246689568181) deployment completed.
EVENTS  2022-02-23T13:04:20.239000+00:00        c60ce4fa-e7a6-4776-907b-b931a166109a    (service simple-webapp) registered 1 targets in (target-group arn:aws:elasticloadbalancing:eu-west-2:000000000000:targetgroup/simple-webapp/2ec4fbc39edca3aa)
EVENTS  2022-02-23T13:03:50.185000+00:00        2e2c4570-2bb3-45f3-83e6-84b61b9c63bb    (service simple-webapp) has started 1 tasks: (task 8b8f8d2258a74885b58e610fbf19a2cc).

Check the webapp via the ALB (ELB):

curl http://$(grep elb_dns etc/variables.yml | cut -d" " -f2)

Check the webapp using DNS (once the DNS has propagated, and replacing yourdomain.com with the domain you are using:

curl http://staging.yourdomain.com/

Get the container logs from running instances:

ansible -i etc/inventory.aws_ec2.yml -u ec2-user --private-key etc/ec2_key.pem tag_Environment_Production -m shell -a "docker ps | grep simple-webapp | cut -d\" \" -f1 | xargs docker logs"

You can also use that method to run ad hoc Ansible commands on the instances, e.g. uptime:

ansible -i etc/inventory.aws_ec2.yml -u ec2-user --private-key etc/ec2_key.pem tag_Environment_Production -m shell -a "uptime"

If you need to SSH to the instance, if there's only one instance:

ssh -i etc/ec2_key.pem ec2-user@$(aws ec2 describe-instances --filters "Name=tag:Environment,Values=Production" --query "Reservations[*].Instances[*].PublicDnsName")

For multiple instances, list the public DNS names as follows, then SSH to each individually as needed:

aws ec2 describe-instances --filters "Name=tag:Environment,Values=Production" --query "Reservations[*].Instances[*].PublicDnsName"