/elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC

Primary LanguageShellMIT LicenseMIT

Elastic CI Stack for AWS

Build status

The Buildkite Elastic CI Stack gives you a private, autoscaling Buildkite Agent cluster. Use it to parallelize legacy tests across hundreds of nodes, run tests and deployments for all your Linux-based services and apps, or run AWS ops tasks.

For documentation on a release, such as the latest stable release, please see its Documentation section.

Features:

  • All major AWS regions
  • Configurable instance size
  • Configurable number of buildkite agents per instance
  • Configurable spot instance bid price
  • Configurable auto-scaling based on build activity
  • Docker and Docker Compose support
  • Per-pipeline S3 secret storage (with SSE encryption support)
  • Docker Registry push/pull support
  • CloudWatch logs for system and buildkite agent events
  • CloudWatch metrics from the Buildkite API
  • Support for stable, beta or edge Buildkite Agent releases
  • Create as many instances of the stack as you need
  • Rolling updates to stack instances to reduce interruption

Contents

Getting Started

See the Elastic CI Stack for AWS guide for a step-by-step guide, or jump straight in:

Launch AWS Stack

Current release is . See Releases for older releases, or Versions for development version

Although the stack will create it's own VPC by default, we highly recommend following best practice by setting up a separate development AWS account and using role switching and consolidated billing—see the Delegate Access Across AWS Accounts tutorial for more information.

If you'd like to use the AWS CLI, download config.json.example, rename it to config.json, and then run the below command:

aws cloudformation create-stack \
  --output text \
  --stack-name buildkite \
  --template-url "https://s3.amazonaws.com/buildkite-aws-stack/latest/aws-stack.yml" \
  --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
  --parameters "$(cat config.json)"

Build Secrets

The stack will have created an S3 bucket for you (or used the one you provided as the SecretsBucket parameter). This will be where the agent will fetch your SSH private keys for source control, and environment hooks to provide other secrets to your builds.

The following s3 objects are downloaded and processed:

  • /env - An agent environment hook
  • /private_ssh_key - A private key that is added to ssh-agent for your builds
  • /git-credentials - A git-credentials file for git over https
  • /{pipeline-slug}/env - An agent environment hook, specific to a pipeline
  • /{pipeline-slug}/private_ssh_key - A private key that is added to ssh-agent for your builds, specific to the pipeline
  • /{pipeline-slug}/git-credentials - A git-credentials file for git over https, specific to a pipeline

These files are encrypted using Amazon's KMS Service. See the Security section for more details.

Here's an example that shows how to generate a private SSH key, and upload it with KMS encryption to an S3 bucket:

# generate a deploy key for your project
ssh-keygen -t rsa -b 4096 -f id_rsa_buildkite
pbcopy < id_rsa_buildkite.pub # paste this into your github deploy key

aws s3 cp --acl private --sse aws:kms id_rsa_buildkite "s3://${SecretsBucket}/private_ssh_key"

If you want to set secrets that your build can access, create a file that sets environment variables and upload it:

echo "export MY_ENV_VAR=something secret" > myenv
aws s3 cp --acl private --sse aws:kms myenv "s3://${SecretsBucket}/env"
rm myenv

Note: Currently only using the default KMS key for s3 can be used, follow #235 for progress on using specific KMS keys

If you really want to store your secrets unencrypted, you can disable it entirely with BUILDKITE_USE_KMS=false.

What’s On Each Machine?

What Type of Builds Does This Support?

This stack is designed to run your builds in a share-nothing pattern similar to the 12 factor application principals:

  • Each project should encapsulate it's dependencies via Docker and Docker Compose
  • Build pipeline steps should assume no state on the machine (and instead rely on build meta-data, build artifacts or S3)
  • Secrets are configured via environment variables exposed using the S3 secrets bucket

By following these simple conventions you get a scaleable, repeatable and source-controlled CI environment that any team within your organization can use.

Multiple Instances of the Stack

If you need to different instances sizes and scaling characteristics between pipelines, you can create multiple stack. Each can run on a different Agent Queue, with it's own configuration, or even in a different AWS account.

Examples:

  • A docker-builders stack that provides always-on workers with hot docker caches (see Optimizing for Slow Docker Builds)
  • A pipeline-uploaders stack with tiny, always-on instances for lightning fast buildkite-agent pipeline upload jobs.
  • A deploy stack with added credentials and permissions specifically for deployment.

Autoscaling

If you have configured MinSize < MaxSize, the stack will automatically scale up and down based on the number of scheduled jobs.

This means you can scale down to zero when idle, which means you can use larger instances for the same cost.

Metrics are collected with a Lambda function, polling every minute.

Terminating the instance after job is complete

You may set BuildkiteTerminateInstanceAfterJob to true to force the instance to terminate after it completes a job. Setting this value to true tells the stack to enable disconnect-after-job in the buildkite-agent.cfg file.

While not enforced, it is highly recommended you also set your AgentsPerInstance value to 1.

We strongly encourage you to find an alternative to this setting if at all possible. The turn around time for replacing these instances is currently slow (5-10 minutes depending on other stack configuration settings). If you need single use jobs, we suggest looking at our container plugins like docker, docker-compose, and ecs, all which can be found here.

Docker Registry Support

If you want to push or pull from registries such as Docker Hub or Quay you can use the environment hook in your secrets bucket to export the following environment variables:

  • DOCKER_LOGIN_USER="the-user-name"
  • DOCKER_LOGIN_PASSWORD="the-password"
  • DOCKER_LOGIN_SERVER="" - optional. By default it will log into Docker Hub

Setting these will perform a docker login before each pipeline step is run, allowing you to docker push to them from within your build scripts.

If you are using Amazon ECR you can set the ECRAccessPolicy parameter to the stack to either readonly, poweruser, or full depending on the access level you want your builds to have

You can disable this in individual pipelines by setting AWS_ECR_LOGIN=false.

If you want to login to an ECR server on another AWS account, you can set AWS_ECR_LOGIN_REGISTRY_IDS="id1,id2,id3".

The AWS ECR options are powered by an embedded version of the ECR plugin, so if you require options that aren't listed here, you can disable the embedded version as above and call the plugin directly. See it's README for more examples (requires Agent v3.x).

Versions

We recommend running the latest release, which is available at https://s3.amazonaws.com/buildkite-aws-stack/aws-stack.yml, or on the releases page.

The latest build of the stack is published to https://s3.amazonaws.com/buildkite-aws-stack/master/aws-stack.yml, along with a version for each commit in the form of https://s3.amazonaws.com/buildkite-aws-stack/master/${COMMIT}.aws-stack.yml.

Branches are published in the form of https://s3.amazonaws.com/buildkite-aws-stack/${BRANCH}/aws-stack.yml.

Updating Your Stack

To update your stack to the latest version use CloudFormation’s stack update tools with one of the urls in the Versions section.

Prior to updating, it's a good idea to set the desired instance size on the AutoscalingGroup to 0 manually.

CloudWatch Metrics

Metrics are calculated every minute from the Buildkite API using a lambda function.

cloudwatch

You’ll find the stack’s metrics under "Custom Metrics > Buildkite" within CloudWatch.

Reading Instance and Agent Logs

Each instance streams both system messages and Buildkite Agent logs to CloudWatch Logs under two log groups:

  • /var/log/messages - System logs
  • /var/log/buildkite-agent.log - Buildkite Agent logs
  • /var/log/docker - Docker daemon logs
  • /var/log/elastic-stack.log - Boot process logs

Within each stream the logs are grouped by instance id.

To debug an agent first find the instance id from the agent in Buildkite, head to your CloudWatch Logs Dashboard, choose either the system or Buildkite Agent log group, and then search for the instance id in the list of log streams.

Customizing Instances with a Bootstrap Script

You can customize your stack’s instances by using the BootstrapScriptUrl stack parameter to run a bash script on instance boot. To set up a bootstrap script, create an S3 bucket with the script, and set the BootstrapScriptUrl parameter, for example s3://my_bucket_name/my_bootstrap.sh.

If the file is private, you'll also need to create an IAM policy to allow the instances to read the file, for example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": ["arn:aws:s3:::my_bucket_name/my_bootstrap.sh"]
    }
  ]
}

Once you’ve created the policy, you must specify the policy’s ARN in the ManagedPolicyARN stack parameter.

Optimizing for Slow Docker Builds

For large legacy applications the Docker build process might take a long time on new instances. For these cases it’s recommended to create an optimized "builder" stack which doesn't scale down, keeps a warm docker cache and is responsible for building and pushing the application to Docker Hub before running the parallel build jobs across your normal CI stack.

An example of how to set this up:

  1. Create a Docker Hub repository for pushing images to
  2. Update the pipeline’s env hook in your secrets bucket to perform a docker login
  3. Create a builder stack with its own queue (i.e. elastic-builders)

Here is an example build pipeline based on a production Rails application:

steps:
  - name: ":docker: :package:"
    plugins:
      docker-compose:
        build: app
        image-repository: my-docker-org/my-repo
    agents:
      queue: elastic-builders
  - wait
  - name: ":hammer:"
    command: ".buildkite/steps/tests"
    plugins:
      docker-compose:
        run: app
    agents:
      queue: elastic
    parallelism: 75

See Issue 81 for ideas on other solutions (contributions welcome!).

Security

This repository hasn't been reviewed by security researchers so exercise caution and careful thought with what credentials you make available to your builds.

Anyone with commit access to your codebase (including third-party pull-requests if you've enabled them in Buildkite) will have access to your secrets bucket files.

Also keep in mind the EC2 HTTP metadata server is available from within builds, which means builds act with the same IAM permissions as the instance.

Development

To get started with customizing your own stack, or contributing fixes and features:

# Build all AMIs
make build

# Or, to set things up locally and create the stack on AWS
make create-stack

# You can use any of the AWS* environment variables that the aws-cli supports
AWS_PROFILE="some-profile" make create-stack

# You can also use aws-vault or similar
aws-vault exec some-profile -- make create-stack

If you need to build your own AMI (because you've changed something in the packer directory), run:

make packer

Questions and Support

Feel free to drop an email to support@buildkite.com with questions. It helps us if you can provide the following details:

# List your stack parameters
aws cloudformation describe-stacks --stack-name MY_STACK_NAME \
  --query 'Stacks[].Parameters[].[ParameterKey,ParameterValue]' --output table

Provide us with logs from Cloudwatch Logs:

/buildkite/elastic-stack/{instance-id}
/buildkite/systemd/{instance-id}

Alternately, drop by #aws-stack and #aws channels in Buildkite Community Slack and ask your question!

Licence

See Licence.md (MIT)