SumoLogic/terraform-provider-sumologic

Eventual Consistency Leading to 400 Error in sumologic_kinesis_log_source creation

Opened this issue · 0 comments

The Sumo Logic provider appears to not utilize any sort of retry logic when creating a sumologic_kinesis_log_source, or at least none that appropriately handles the eventual consistency issue with IAM Role Authentication. Given the following HCL:

resource "sumologic_kinesis_log_source" "cwl-shipper" {
  name         = "${var.env}-${var.product}-${var.log_name} kinesis firehose logs (${var.region})"
  description  = "Managed by Terraform"
  collector_id = var.sumo-collector_id
  content_type = "KinesisLog"
  category     = var.sumo-source_category

  path {
    type            = "KinesisLogPath"
    bucket_name     = module.firehose_with_s3.failures_bucket.name
    path_expression = "http-endpoint-failed/${local.failed_log_path}*"
  }

  authentication {
    type     = "AWSRoleBasedAuthentication"
    role_arn = var.sumologic_role.arn
  }

  depends_on = [aws_iam_role_policy.sumo-pull_logs_from_s3]
}

resource "aws_iam_role_policy" "sumo-pull_logs_from_s3" {
  name   = local.name
  role   = var.sumologic_role["name"]
  policy = data.aws_iam_policy_document.sumo-pull_logs_from_s3.json
}

data "aws_iam_policy_document" "sumo-pull_logs_from_s3" {
  statement {
    effect = "Allow"

    actions = [
      "s3:ListBucketVersions",
      "s3:ListBucket",
      "s3:GetObjectVersion",
      "s3:GetObject",
    ]

    resources = [
      module.firehose_with_s3.failures_bucket.arn,
      "${module.firehose_with_s3.failures_bucket.arn}/*"
    ]
  }
}

We will consistently receive this error on initial creation:

│ Error: {
│   "status" : 400,
│   "id" : "IVE-REPLACED-THISID",
│   "code" : "collectors.validation.fields.invalid",
│   "message" : "The S3 bucket 'bucketName=bucket-name-replaced-here-but-the-bucket-definitely-exists' is not readable. The following permission is missing : 'missingPermission=ListBucketVersions'."
│ }
│
│   with module.us-west-2.module.common.module.api.module.modulenamegoeshere.sumologic_kinesis_log_source.cwl-shipper,
│   on .terraform/modules/us-west-2.common.api.modulenamegoeshere/sumologic-collector.tf line 1, in resource "sumologic_kinesis_log_source" "cwl-shipper":
│    1: resource "sumologic_kinesis_log_source" "cwl-shipper" {

Now, we know the error is not a permissions error because on the very next apply with zero changes, the apply succeeds with no errors. So, that means the issue is the eventual consistency that's warned about in the docs:

Now that you have completed these steps and have created an IAM Role you need to wait two to five minutes before using it for an AWS Source's authentication. This is to account for AWS's eventual consistency.

This seems like a perfect usecase for backoff/retry logic.