terraform-aws-modules/terraform-aws-lambda

AWS Lambda loses access to ECR image

aalloul opened this issue · 7 comments

Description

Hello,

I'm facing this weird issue where all of my lambda functions seem to lose the possibility of pulling the images from ECR. Here's the context:

  • I have ~13 Lambda functions
  • they all use Docker images that are stored on ECR
  • Sometimes, as I'm making changes to one of them, the result is that all other lambda functions lose access to the images. On the console, I see the following message
Failed to restore the function <lambda_function_name>: ERROR: Lambda cannot initialize the provided container image. Verify the image.

What I'm using:

  • ECR using source = "terraform-aws-modules/ecr/aws"
module "ecr" {
  source = "terraform-aws-modules/ecr/aws"

  repository_name   = "some_name"
  create_repository = true

  repository_read_write_access_arns = [data.aws_caller_identity.this.arn]
  create_lifecycle_policy           = true
  repository_lifecycle_policy       = jsonencode({
    rules = [
      {
        rulePriority = 1,
        description  = "Keep last 2 images.",
        selection    = {
          tagStatus     = "tagged",
          tagPrefixList = ["v"],
          countType     = "imageCountMoreThan",
          countNumber   = 2
        },
        action = {
          type = "expire"
        }
      }
    ]
  })

  repository_force_delete = false
}

data "aws_iam_policy_document" "registry" {
  statement {
    principals {
      type        = "AWS"
      identifiers = ["arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.this.account_id}:root"]
    }

    actions = [
      "ecr:ReplicateImage",
    ]

    resources = [
      module.ecr.repository_arn
    ]
  }
}

module "ecr_registry" {
  source = "terraform-aws-modules/ecr/aws"

  create_repository = false

  # Registry Policy
  create_registry_policy = true
  registry_policy        = data.aws_iam_policy_document.registry.json

  repository_lifecycle_policy = jsonencode({
    "rules" : [
      {
        "rulePriority" : 1,
        "description" : "Keep only the last 2 images.",
        "selection" : {
          "tagStatus" : "any",
          "countType" : "imageCountMoreThan",
          "countNumber" : 2
        },
        "action" : {
          "type" : "expire"
        }
      }
    ]
  })

  # Registry Scanning Configuration
  manage_registry_scanning_configuration = false
  registry_scan_type                     = "basic"

  # Registry Replication Configuration
  create_registry_replication_configuration = false

}
  • Lambda functions are all created following this model:
module "lambda_docker_build" {

  source = "terraform-aws-modules/lambda/aws//modules/docker-build"

  create_ecr_repo = false
  ecr_repo        = var.ecr_repo

  image_tag   = "image_name:0.1"
  source_path = "${path.module}/../back-end"
  docker_file_path = "path/to/Dockerfile"
}

module "lambda_function" {
  source = "terraform-aws-modules/lambda/aws"

  function_name = "${var.project_name}-lambda_function"

  create_package = false
  publish        = true

  # Use the docker image we built above
  image_uri    = module.lambda_docker_build.image_uri
  package_type = "Image"

  create_role            = false
  lambda_role            = aws_iam_role.basic_lambda_role.arn
  timeout                = 30
}
  • [x ] I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:
registry.terraform.io/terraform-aws-modules/apigateway-v2/aws:  2.2.1
registry.terraform.io/terraform-aws-modules/lambda/aws//modules/docker-build: 4.7.1
registry.terraform.io/terraform-aws-modules/rds/aws: 5.1.0
egistry.terraform.io/terraform-aws-modules/lambda/aws: 4.7.1
registry.terraform.io/terraform-aws-modules/security-group/aws: 4.16.0
registry.terraform.io/terraform-aws-modules/vpc/aws: 3.14.2
registry.terraform.io/terraform-aws-modules/ecr/aws: 1.5.0
egistry.terraform.io/terraform-aws-modules/ecs/aws: 3.5.0
  • Terraform version:
Terraform v1.3.4
on darwin_amd64
  • Provider version(s):
+ provider registry.terraform.io/hashicorp/aws v4.39.0
+ provider registry.terraform.io/hashicorp/external v2.2.3
+ provider registry.terraform.io/hashicorp/local v2.2.3
+ provider registry.terraform.io/hashicorp/null v3.2.0
+ provider registry.terraform.io/hashicorp/random v3.4.3
+ provider registry.terraform.io/kreuzwerker/docker v2.23.0

Reproduction Code [Required]

Steps to reproduce the behavior:
It's really hard to know which steps exactly lead to this issue but it seems to me that:

  • I work on a docker image for some lambda
  • make multiple changes as required / tests multiple times / deploy multiple times
  • at some point, I realise that all other lambda functions display the same error as above

Expected behaviour

I don't expect the other lambda functions to lose the ability to initialise

Actual behaviour

Lambda functions loose ability to access ECR images. To work around this, I change the image versions and run terraform apply. This builds them all over and pushes them to ECR. This seems to fix the issue until it happens again

I think you can hint Terraform by using depends_on = [module.lambda_docker_build] inside of lambda_function module block. This way Terraform will wait for the completion of the build and publish it to ECR.

I can't imagine other reasons for such behavior other than race condition.

Hi @antonbabenko , thanks for your super quick reply! I'll definitely try that

Update: the error happened again despite me putting the depends_on pretty much everywhere

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

This issue was automatically closed because of stale in 10 days

Hi @aalloul, were you able to solve this issue?

I'm going to lock this issue because it has been closed for 30 days . This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.