philips-labs/terraform-aws-github-runner

Error: InvalidFleetConfig The fleet configuration contains duplicate instance pools.

dgokcin opened this issue · 8 comments

Helloo

Trying to deploy the module to my infrastructure and having some problems with the scale-up lambda.

Any idea why I might have an invalid fleet config? Thanks!

include {
  path = find_in_parent_folders()
}

locals {
  common_vars = yamldecode(file(find_in_parent_folders("common_vars.yaml")))
}

terraform {
  source = "../../..//modules/terraform-aws-github-runner"
}

inputs = {
  aws_region            = "eu-west-1"
  vpc_id                = dependency.vpc.outputs.vpc_id
  subnet_ids            = dependency.vpc.outputs.private_subnets
  lambda_s3_bucket      = "myorg-pg-gh-actions-lambdas"
  runners_lambda_s3_key = "runners.zip"
  syncer_lambda_s3_key  = "runner-binaries-syncer.zip"
  webhook_lambda_s3_key = "webhook.zip"

  prefix = "gh-ci"
  github_app = {
    key_base64     = "key"
    id             = "id"
    webhook_secret = "webhook_secret"
  }

  create_service_linked_role_spot = true

  enable_organization_runners = true
  runner_extra_labels         = "default,example"

  instance_types = ["m5.large", "c5.large"]

  # override delay of events in seconds
  delay_webhook_event   = 5
  runners_maximum_count = 1

  # set up a fifo queue to remain order
  enable_fifo_build_queue = true

  # override scaling down
  scale_down_schedule_expression = "cron(* * * * ? *)"
  # enable this flag to publish webhook events to workflow job queue
  # enable_workflow_job_events_queue  = true

  enable_user_data_debug_logging_runner = true

  tags = local.common_vars.tags
}

dependency "vpc" {
  config_path = "../vpc"
}

image

update: this was the default configuration. I also tried deploying the module with multiple_runners. I believe the following code is failing and I have no idea what is going on. This is my first time working with spot instances so I really appreciate any help.

Thanks

    let fleet;
    try {
        // see for spec https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html
        fleet = await ec2
            .createFleet({
            LaunchTemplateConfigs: [
                {
                    LaunchTemplateSpecification: {
                        LaunchTemplateName: runnerParameters.launchTemplateName,
                        Version: '$Default',
                    },
                    Overrides: generateFleetOverrides(runnerParameters.subnets, runnerParameters.ec2instanceCriteria.instanceTypes, amiIdOverride),
                },
            ],
            SpotOptions: {
                MaxTotalPrice: runnerParameters.ec2instanceCriteria.maxSpotPrice,
                AllocationStrategy: runnerParameters.ec2instanceCriteria.instanceAllocationStrategy,
            },
            TargetCapacitySpecification: {
                TotalTargetCapacity: numberOfRunners,
                DefaultTargetCapacityType: runnerParameters.ec2instanceCriteria.targetCapacityType,
            },
            TagSpecifications: [
                {
                    ResourceType: 'instance',
                    Tags: [
                        { Key: 'ghr:Application', Value: 'github-action-runner' },
                        { Key: 'Type', Value: runnerParameters.runnerType },
                        { Key: 'Owner', Value: runnerParameters.runnerOwner },
                    ],
                },
            ],
            Type: 'instant',
        })
            .promise();
    }
    catch (e) {
        logger.warn('Create fleet request failed.', e);
        throw e;
    }

@dgokcin Do your private subnets (which you seem to inject via a terragrunt dependency) reside in different availability zones?

(Quick google search returns an issue in an unrelated repo, but exhibiting the same error message)

yeah I figured it out from that exact post. in my test environment, i have multiple subnets in the same az. just providing a single private subnet resolved my problem.

thanks!

@npalm Maybe it'd be a good idea to highlight this requirement (configured subnets must reside in different AZs) in the subnet_ids input documentation?

@npalm Maybe it'd be a good idea to highlight this requirement (configured subnets must reside in different AZs) in the subnet_ids input documentation?

I agree since I was able to figure it out by chance after several hours of reaearching on a different repo.

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

Just a heads-up: this can also occur (false/misleading positive) if one accidentally enters duplicate EC2 instance-types in the list-of-string supplied to this module. Lesson learned: wrapped it with the distinct() function.

@IAXES You saved me what could have been hours of frustration!! ❤️ ❤️ ❤️