The Buildkite Elastic CI Stack gives you a private, autoscaling Buildkite Agent cluster. Use it to parallelize legacy tests across hundreds of nodes, run tests and deployments for all your Linux-based services and apps, or run AWS ops tasks.
Features:
- All major AWS regions
- Configurable instance size
- Configurable number of buildkite agents per instance
- Configurable spot instance bid price
- Configurable auto-scaling based on build activity
- Docker and Docker Compose support
- Per-pipeline S3 secret storage (with SSE encryption support)
- Docker Registry push/pull support
- CloudWatch logs for system and buildkite agent events
- CloudWatch metrics from the Buildkite API
- Support for stable, unstable or experimental Buildkite Agent releases
- Create as many instances of the stack as you need
- Rolling updates to stack instances to reduce interruption
- Getting Started
- What’s On Each Machine?
- What Type of Builds Does This Support?
- Multiple Instances of the Stack
- Autoscaling Configuration
- Configuration Environment Variables
- Secrets Bucket Support
- Docker Registry Support
- Updating Your Stack
- CloudWatch Metrics
- Reading Instance and Agent Logs
- Optimizing for Slow Docker Builds
- Security
- Questions?
- Licence
See the Elastic CI Stack for AWS guide for a step-by-step guide, or jump straight in:
Although the stack will create it's own VPC by default, we highly recommend following best practice by setting up a separate development AWS account and using role switching and consolidated billing—see the Delegate Access Across AWS Accounts tutorial for more information.
If you'd like to use the AWS CLI, download config.json.example
, rename it to config.json
, and then run the below command:
aws cloudformation create-stack \
--output text \
--stack-name buildkite \
--template-url "https://s3.amazonaws.com/buildkite-aws-stack/aws-stack.json" \
--capabilities CAPABILITY_IAM \
--parameters <(cat config.json)
If you’d prefer to use this repo or build it yourself, clone it and run the following commands:
# To set up your local environment and build a template based on public AMIs
make setup download-mappings build
# Or, to set things up locally and create the stack on AWS
make create-stack
# You can use any of the AWS* environment variables that the aws-cli supports
AWS_PROFILE="some-profile" make create-stack
# You can also use aws-vault or similar
aws-vault exec some-profile -- make create-stack
Adding extra tags to the stack (including the EC2 instances) can be done via extra_tags.json
(see extra_tags.json.example
for usage).
- Amazon Linux
- Buildkite Agent
- Docker
- Docker Compose
- aws-cli - useful for performing any ops-related tasks
- jq - useful for manipulating JSON responses from cli tools such as aws-cli or the Buildkite API
- docker-gc - removes old docker images
- lifecycled - manages AWS autoscaling events
This stack is designed to run your builds in a share-nothing pattern similar to the 12 factor application principals:
- Each project should encapsulate it's dependencies via Docker and Docker Compose
- Build pipeline steps should assume no state on the machine (and instead rely on build meta-data, build artifacts or S3)
- Secrets are configured via environment variables exposed using the S3 secrets bucket
By following these simple conventions you get a scaleable, repeatable and source-controlled CI environment that any team within your organization can use.
If you need to optimize pipelines for your types of applications you can create multiple stack with different configurations, each with a different Agent Queue.
For example, you could have a builders
stack that provides always on machines with warm Docker caches for building and pushing to a Docker registry at the start of a CI run. Or you could have a single t2.nano
stack that is used for lightning fast buildkite-agent pipeline upload
jobs.
Because each stack can run in a different agent queue, and each one self-contained (potentially in completely different AWS accounts), you're free to experiment without interrupting existing builds.
If you provided a BuildkiteApiAccessToken
your build agents will autoscale. Autoscaling is designed to scale up quite quickly and then gradually scale down. Scaling up happens when there are scheduled jobs exist that are waiting for agents. Scaling down happens when there are no more running jobs.
See the autoscale.yml template for more details, or the Buildkite Metrics Publisher project for how metrics are collected.
The following environment variables can be set on the Buildkite pipeline, or individual build step, to customize the behaviour of the stack:
BUILDKITE_SECRETS_BUCKET
- the name of the S3 bucket where secrets are stored. Default: the value set in the stack parameter when the stack was created. Example:my-secrets-bucket
BUILDKITE_SECRETS_KEY
- the encryption key used to decrypt objects from the secrets bucket. Default: nil. Example:w2Uzhc4kXXbW//T9zaY3neoCbR9roQ10
BUILDKITE_SECRETS_PREFIX
- the folder within the secrets bucket. Default: the build pipeline's slug. Example:my-great-pipeline
SSH_KEY_NAME
- the filename of the SSH key inside this pipeline’s folder in the secrets bucket. Default:private_ssh_key
. Example:other_ssh_key
SHARED_SSH_KEY_NAME
- the filename of the SSH key in the root of the secrets bucket if there's no pipeline-specific SSH key present. Default:private_ssh_key
. Example:other_ssh_key
The stack has a SecretsBucket
parameter which will allow your build agents to automatically get access to SSH private keys and environment hooks for exposing environment variables to builds. The stack doesn't create the bucket for you, you need to do this yourself, but it does create a role that gives read access to the build machines.
The secrets bucket can contain the following files:
/private_ssh_key
- An optional private key to use for Git SSH operations when there is no pipeline-specific key present/{pipeline-slug}/env
- An optional bash script to use as an agent environment hook/{pipeline-slug}/private_ssh_key
- An optional pipeline-specific private key to use for Git SSH operations
The files in your secrets bucket should be encrypted with server-side object encryption to ensure they are reasonably secure. See the Security section for more details.
Encryption is done via the BUILDKITE_SECRETS_KEY
environment variable set via the Buildkite pipeline settings, and can be the same, or different, for each pipeline.
Here’s an an example (for OS X) that shows how to create and copy a random encyption passphrase, generate a private SSH key, and upload it with SSE encryption to an S3 bucket:
# generate a deploy key for your project
ssh-keygen -t rsa -b 4096 -f id_rsa_buildkite
pbcopy < id_rsa_buildkite.pub # paste this into your github deploy key
# upload the private key, encrypted
PASSPHRASE=$(head -c 24 /dev/urandom | base64)
aws s3 cp --acl private --sse-c --sse-c-key "$PASSPHRASE" id_rsa_buildkite "s3://{SecretsBucket}/private_ssh_key"
pbcopy <<< "$PASSPHRASE" # paste passphrase into buildkite env as BUILDKITE_SECRETS_KEY
# cleanup
unset PASSPHRASE
rm id_rsa_buildkite*
If you want to push or pull from Docker Hub you can use the env
file in your secrets bucket to export DOCKER_HUB_USER
, DOCKER_HUB_PASSWORD
and DOCKER_HUB_EMAIL
. This will perform a docker login
before each pipeline step is run, allowing you to docker push
to Docker Hub.
If you want to use AWS ECR instead of Docker Hub there's no need to worry about credentials, you simply ensure that your agent machines have the necessary IAM roles and permissions.
For all other services you’ll need to perform your own docker login
commands using the env
hook.
To update your stack to the latest version use CloudFormation’s stack update tools with this S3 URL:
https://s3.amazonaws.com/buildkite-aws-stack/aws-stack.json
After updating the stack you may need to recycle your machines by changing the auto-scale groups to 0, and then back to the desired number.
Note: If you use spot pricing and a MinSize
greater than 0 you’ll first need to first change MinSize
to 0 before updating your stack using the above S3 URL.
This stack includes the Buildkite Metrics Publisher nested within it, which runs a small instance to monitor the Buildkite API and report the metrics to CloudWatch.
You’ll find the stack’s metrics under "Custom Metrics > Buildkite" within CloudWatch.
Each instance streams both system messages and Buildkite Agent logs to CloudWatch Logs under two log groups:
/var/log/messages
- system logs/var/log/buildkite-agent.log
- Buildkite Agent logs/var/log/docker
- Docker daemon logs
Within each stream the logs are grouped by instance id.
To debug an agent first find the instance id from the agent in Buildkite, head to your CloudWatch Logs Dashboard, choose either the system or Buildkite Agent log group, and then search for the instance id in the list of log streams.
For large legacy applications the Docker build process might take a long time on new instances. For these cases it’s recommended to create an optimized "builder" stack which doesn't scale down, keeps a warm docker cache and is responsible for building and pushing the application to Docker Hub before running the parallel build jobs across your normal CI stack.
An example of how to set this up:
- Create a Docker Hub repository for pushing images to
- Update the pipeline’s
env
hook in your secrets bucket to perform adocker login
- Create a builder stack with its own queue (i.e.
elastic-builders
), making sure to usebeta
agents so you can use the Docker Compose Buildkite Plugin and pre-building
Here is an example build pipeline based on a production Rails application:
steps:
- name: ":docker: :package:"
plugins:
docker-compose:
build: app
image-repository: index.docker.io/my-docker-org/my-repo
agents:
queue: elastic-builders
- wait
- name: ":hammer:"
command: ".buildkite/steps/tests"
plugins:
docker-compose:
run: app
agents:
queue: elastic
parallelism: 75
See Issue 81 for ideas on other solutions (contributions welcome!).
This repository hasn't been reviewed by security researchers so exercise caution and careful thought with what credentials you make available to your builds.
Anyone with commit access to your codebase (including third-party pull-requests if you've enabled them in Buildkite) will have access to your secrets bucket files, and anyone with access to the Buildkite pipeline settings will be able to access the pipeline’s secrets encryption key. In combination, the attacker would have access to your decrypted secrets.
Also keep in mind the EC2 HTTP metadata server is available from within builds, which means builds act with the same IAM permissions as the instance.
Feel free to drop me an email to support@buildkite.com with questions, or checkout the #aws-stack
and #aws
channels in Buildkite Slack.
See Licence.md (MIT)