/terraform-gitlab-runner-aws-spot

Terraform module to provision a self-hosted auto-scaling Gitlab runner on AWS spot or on-demand instances

Primary LanguageHCLApache License 2.0Apache-2.0

AWS Gitlab Runner Terraform module

Introduction

This module provisions a self-hosted Gitlab runner with docker+machine executor and auto-scaling configuration.

Architecture

The architecture is quite standard and mainly consists of EC2 instance (aka manager) which has all required software installed and automatically registers itself with Gitlab. It spawns worker instances which run CI/CD jobs and doesn't run any jobs itself.

Features:

Implementation notes:

  • This module is designed to work with Amazon Linux 2 AMIs. Other Linux distros most likely won't work!

Security considerations:

  • SSM Session Manager is a recommended way of accessing manager instance as it provides centralized access control, full audit and activity logging
  • Consider limiting self-hosted runners to private and internal repositories as running CI/CD pipelines for public repositories on your infrastructure introduce additional attack surface
  • Consider dedicating a separate VPC, subnets and AWS (sub)account for Gitlab Runners to reduce blast radius and attack surface. Setting a budget and billing alarm for your infrastructure may also be a wise choice

Cost optimization recommendations:

  • Consider purchasing Savings Plan or Reserved Instance for manager instance
  • Consider using AMD-powered EC2 instance types for manager instance (they are 10% cheaper compared to the Intel-powered instances at the moment of this writing)
  • Spot Instances with a defined duration (also known as Spot blocks) are no longer available to new AWS customers from July 1, 2021. For customers who have previously used the feature, AWS will continue to support Spot Instances with a defined duration until December 31, 2022 deprecation notice. If you don't need Spot Instances with a defined duration, please set spot_block_duration parameter to 0 in the runner config object.

Other recommendations:

  • If you use distributed cache feature, consider provisioning Gateway VPC Endpoint for S3 and routing all S3 traffic through it to avoid additional data tranfer charges and don't let this traffic leave AWS backbone network
  • Make sure to get yourself acquainted with Caveats related to Spot instances usage for running CI/CD jobs

Backlog:

  • Allow manager instance deployment as ECS service with Fargate launch type
  • Add examples to the repo
  • Support Autoscaling periods
  • Add an option to request regular on-demand instances instead of the spot

This module is backed by best of breed terraform modules maintained by Cloudposse.

Terraform versions

Terraform 0.12. Pin module version to ~> 1.0. Submit pull-requests to terraform012 branch.

Terraform 0.13. Pin module version to ~> 2.0. Submit pull-requests to master branch.

Usage

IMPORTANT: The master branch is used in source just as an example. In your code, do not pin to master because there may be breaking changes between releases. Instead pin to the release tag (e.g. ?ref=tags/x.y.z) of one of our latest releases.

This example creates a Gitlab runner in us-west-2 region and availability zone d with the registration token passed via variable.

data "aws_ami" "amzn_linux_2" {
 most_recent = true
 owners      = ["amazon"]

 filter {
   name   = "name"
   values = ["amzn2-ami-hvm-*-x86_64-ebs"]
 }
}

data "aws_ami" "ubuntu_18_04" {
  most_recent = true
  owners      = ["099720109477"]

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*"]
  }
}

module "gitlab_runner" {
  source  = "aleks-fofanov/runner-aws-spot/gitlab"
  version = "~> 2.0"
  
  name      = "stack"
  namespace = "cp"
  stage     = "prod"
  
  registration_token = "XXXXXXXX"

  availability_zone = "d"

  vpc = {
    vpc_id     = "XXXXXXXX"
    cidr_block = "10.0.0.0/16"
  }
  
  manager = {
    ami_id                      = data.aws_ami.amzn_linux_2.id
    ami_owner                   = "amazon"
    instance_type               = "t3a.micro"
    key_pair                    = null
    subnet_id                   = "subnet-XXXXXXXX"
    associate_public_ip_address = true
    assign_eip_address          = false
    enable_detailed_monitoring  = false
    root_volume_size            = 8
    ebs_optimized               = false
  }

  runner = {
    concurent = 2
    limit     = 2
    tags      = ["shared", "docker", "spot", "us-west-2d"]
    image     = "docker:20.10"

    instance_type       = "c5a.large"
    ami_id              = data.aws_ami.ubuntu_18_04.id
    use_private_address = true

    run_untagged        = false
    lock_to_project     = true

    spot_bid_price      = 0.09
    spot_block_duration = 60
    request_spot_instances = true

    idle = {
      count = 0
      time  = 1200
    }
    autoscaling_periods = [
      {
        periods = ["* * 9-17 * * mon-fri *"]
        idle_count = 1
        idle_time = 1200
        timezone = "UTC"
      }
    ]
  }
}

Please refer to the examples folder for a complete example.

Requirements

Name Version
terraform >= 0.13
aws >= 3.0
local >= 2.1.0
null >= 3.0

Providers

Name Version
aws >= 3.0

Modules

Name Source Version
aggregated_policy cloudposse/iam-policy-document-aggregator/aws 0.8.0
auth_token_ssm_param_label cloudposse/label/null 0.25.0
default_label cloudposse/label/null 0.25.0
manager_instance cloudposse/ec2-instance/aws 0.30.4
manager_label cloudposse/label/null 0.25.0
runner_label cloudposse/label/null 0.25.0
s3_cache_bucket cloudposse/s3-bucket/aws 0.33.0

Resources

Name Type
aws_cloudwatch_log_group.manager resource
aws_iam_instance_profile.runner resource
aws_iam_policy.manager resource
aws_iam_role.runner resource
aws_iam_role_policy_attachment.manager resource
aws_iam_role_policy_attachment.manager_cloudwatch_logs resource
aws_iam_role_policy_attachment.manager_ssm_sessions resource
aws_iam_service_linked_role.autoscaling resource
aws_iam_service_linked_role.spot resource
aws_s3_bucket_public_access_block.cache resource
aws_security_group.manager resource
aws_security_group.runners resource
aws_security_group_rule.manager_egress resource
aws_security_group_rule.manager_metrics_from_allowed resource
aws_security_group_rule.manager_ssh_from_allowed resource
aws_security_group_rule.runners_docker_machine_from_vpc resource
aws_security_group_rule.runners_egress resource
aws_security_group_rule.runners_icmp_from_vpc resource
aws_security_group_rule.runners_ssh_from_vpc resource
aws_ssm_parameter.authentication_token resource
aws_availability_zone.default data source
aws_caller_identity.default data source
aws_iam_policy_document.authentication_token_kms_permissions data source
aws_iam_policy_document.authentication_token_ssm_param_permissions data source
aws_iam_policy_document.cache data source
aws_iam_policy_document.create_service_linked_roles data source
aws_iam_policy_document.docker_machine data source
aws_iam_policy_document.ecr data source
aws_iam_policy_document.registration_token_kms_permissions data source
aws_iam_policy_document.registration_token_ssm_param_permissions data source
aws_iam_policy_document.runner_assume data source
aws_iam_policy_document.ssm_sessions data source
aws_kms_key.authentication_token data source
aws_kms_key.registration_token data source
aws_partition.default data source
aws_region.default data source
aws_ssm_parameter.registration_token data source

Inputs

Name Description Type Default Required
additional_security_groups List of Security Group IDs allowed to be associated with manager instance list(string) [] no
allowed_metrics_cidr_blocks CIDR blocks that should be able to access metrics port exposed on manager instance
list(object({
cidr_blocks = list(string)
description = string
}))
[] no
allowed_ssh_cidr_blocks CIDR blocks that should be able to communicate with manager's 22 port
list(object({
cidr_blocks = list(string)
description = string
}))
[] no
attributes Additional attributes, e.g. 1 list(string) [] no
authentication_token_ssm_param An override for SSM Parameter name that will store runner authentication token string null no
authentication_token_ssm_param_kms_key Identifier of KMS key used for encryption of SSM Parameter that will store authentication token string null no
availability_zone Availability Zone (e.g. a, b, c etc.) for instances to be launched in string "a" no
cloudwatch_logs_kms_key_arn The ARN of the KMS Key to use when encrypting log data. Please note, after the AWS KMS CMK is disassociated from the log group, AWS CloudWatch Logs stops encrypting newly ingested data for the log group. All previously ingested data remains encrypted, and AWS CloudWatch Logs requires permissions for the CMK whenever the encrypted data is requested. string null no
cloudwatch_logs_retention Number of days you want to retain log events in Cloudwatch log group number 30 no
create_autoscaling_service_linked_role Defines whether to create service-linked role for EC2 autoscaling bool true no
create_spot_service_linked_role Defines whether to create service-linked role for EC2 spot instances bool true no
delimiter Delimiter to be used between namespace, name, stage and attributes string "-" no
docker_machine_version Docker machine version to be installed on manager instance string "0.16.2-gitlab.13" no
enable_access_to_ecr_repositories A list of ECR repositories in specified region that manager instance should have read-only access to list(string) [] no
enable_cloudwatch_logs Defines whether manager instance should ship its logs to Cloudwatch bool true no
enable_s3_cache Defines whether s3 should be created and used as a source for distributed cache bool true no
enable_ssm_sessions Defines whether access via SSM Session Manager should be enabled for manager instance bool true no
gitlab_runner_version Gitlab runner version to be installed on manager instance string "14.2.0" no
gitlab_url Gitlab URL string "https://gitlab.com" no
manager Runners' manager (aka bastion) configuration
object({
ami_id = string
ami_owner = string
instance_type = string
key_pair = string
subnet_id = string
associate_public_ip_address = bool
assign_eip_address = bool
root_volume_size = number
ebs_optimized = bool
enable_detailed_monitoring = bool
metadata_http_endpoint_enabled = bool
metadata_http_put_response_hop_limit = number
metadata_http_tokens_required = bool
})
n/a yes
metrics_port See https://docs.gitlab.com/runner/monitoring/#configuration-of-the-metrics-http-server for more details number 9252 no
name Solution name, e.g. 'app' or 'jenkins' string n/a yes
namespace Namespace (e.g. cp or cloudposse) string "" no
registration_token Runner registration token string null no
registration_token_ssm_param SSM Parameter name that stored runner registration token. This parameter takes precedence over registration_token string null no
registration_token_ssm_param_kms_key Identifier of KMS key used for encryption of SSM Parameter that stores registration token string null no
runner Gitlab runner configuration. See https://docs.gitlab.com/runner/configuration/advanced-configuration.html
object({
concurrent = number
limit = number

image = string
tags = list(string)

use_private_address = bool
instance_type = string
ami_id = string

run_untagged = bool
lock_to_project = bool

idle = object({
count = number
time = number
})

autoscaling_periods = list(object({
periods = list(string)
idle_count = number
idle_time = number
timezone = string
}))

request_spot_instances = bool
spot_bid_price = number
spot_block_duration = number
})
n/a yes
runner_advanced_config Advanced configuration options for gitlab runner
object({
pre_build_script = string
post_build_script = string
pre_clone_script = string
environment = list(string)
request_concurrency = number
output_limit = number
shm_size = number
max_builds = number
pull_policy = string
additional_volumes = list(string)
additional_docker_machine_options = list(string)
root_volume_size = number
ebs_optimized = bool
enable_detailed_monitoring = bool
})
{
"additional_docker_machine_options": [],
"additional_volumes": [
"/certs/client"
],
"ebs_optimized": false,
"enable_detailed_monitoring": false,
"environment": [],
"max_builds": 0,
"output_limit": 4096,
"post_build_script": "",
"pre_build_script": "",
"pre_clone_script": "",
"pull_policy": "always",
"request_concurrency": 1,
"root_volume_size": 20,
"shm_size": 0
}
no
s3_cache_expiration Number of days you want to retain cache in S3 bucket number 45 no
s3_cache_infrequent_access_transition Number of days to persist in the standard storage tier before moving to the infrequent access tier number 30 no
stage Stage (e.g. prod, dev, staging) string "" no
tags Additional tags (e.g. map(BusinessUnit,XYZ) map(string) {} no
vpc VPC configuration
object({
vpc_id = string
cidr_block = string
})
n/a yes

Outputs

Name Description
auth_token_ssm_param_arn ARN of SSM Parameter that stores runner's authentication token
auth_token_ssm_param_name Name of SSM Parameter that stores runner's authentication token
manager_instance Disambiguated ID of manager instance
manager_instance_cloudwatch_alarm CloudWatch Alarm ID created for manager instance
manager_instance_cloudwatch_log_group_arn ARN of CloudWatch Log Group created for manager instance
manager_instance_cloudwatch_log_group_name Name of CloudWatch Log Group created for manager instance
manager_instance_name Manager instance name
manager_instance_policy_arn ARN of AWS IAM Policy associated with manager instance IAM role
manager_instance_policy_name Name of AWS IAM Policy associated with manager instance IAM role
manager_instance_primary_security_group_id An ID of security group created for and associated with manager instance
manager_instance_private_dns Private DNS of manager instance
manager_instance_private_ip Private IP of manager instance
manager_instance_public_dns Public DNS of manager instance (or DNS of EIP)
manager_instance_public_ip Public IP of manager instance (or EIP)
manager_instance_role_arn ARN of AWS IAM Role associated with manager instance
manager_instance_role_name Name of AWS IAM Role associated with manager instance
manager_instance_security_group_ids List of all security groups ID associated with manager instance
manager_instance_ssh_key_pair Name of the SSH key pair provisioned on manager instance
runner_instance_primary_security_group_id An ID of security group created for and associated with manager instance
runner_instance_role_arn ARN of AWS IAM Role associated with runner instance(s)
runner_instance_role_name Name of AWS IAM Role associated with runner instance(s)
s3_cache_bucket_arn Cache bucket ARN
s3_cache_bucket_id Cache bucket Name (aka ID)

Authors

Module is created and maintained by Aleksandr Fofanov.

License

Apache 2 Licensed. See LICENSE for full details.

Help

Got a question?

File a GitHub issue.