terraform-linters/tflint-ruleset-aws

`aws_resource_missing_tags`: data source reference in tag value triggers issue instead of being ignored

tinder-tder opened this issue · 14 comments

I tried to update to the latest ruleset (0.24.x) and something has changed where it will error with missing tags when they exist.

Installing `aws` plugin...
Installed `aws` (source: github.com/terraform-linters/tflint-ruleset-aws, version: 0.24.0)
Plugin `terraform` is already installed
2 issue(s) found:

Notice: The resource is missing the following tags: "ContainingPII", "Creator", "Name", "Owner", "PublicFacing". (aws_resource_missing_tags)

  on kms.tf line 91:
  91:   tags = {
  92:     Name          = local.bucket_name
  93:     Creator       = var.creator_tag
  94:     Owner         = var.owner_tag
  95:     ContainingPII = var.containingpii_tag
  96:     PublicFacing  = var.publicfacing_tag
  97:   }

Reference: https://github.com/terraform-linters/tflint-ruleset-aws/blob/v0.24.0/docs/rules/aws_resource_missing_tags.md

Notice: The resource is missing the following tags: "ContainingPII", "Creator", "Name", "Owner", "PublicFacing". (aws_resource_missing_tags)

  on main.tf line 20:
  20:   tags = local.tags

Reference: https://github.com/terraform-linters/tflint-ruleset-aws/blob/v0.24.0/docs/rules/aws_resource_missing_tags.md

Downgrading to 0.23.0 works fine. I noticed there were recent changes noted int he changelog for 0.24.0 related to taht rule so I am assuming there was some breakage there

Example configuration that reproduces the issue is required to move forward. Avoid including any incidental configuration or dependencies.

sorry here is an example:

terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.0"
    }
  }
}

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

locals {
  bucket_name = "aws-athena-query-results-${data.aws_caller_identity.current.id}-${data.aws_region.current.name}"
}

#tfsec:ignore:aws-s3-enable-bucket-logging tfsec:ignore:aws-s3-enable-versioning
resource "aws_s3_bucket" "bucket" {
  # Bucket names are in the format `aws-athena-query-results-000000000-us-east-1`
  bucket = local.bucket_name

  tags = {
    Name          = local.bucket_name
    Creator       = "foo"
    Owner         = "bar"
    ContainingPII = "baz"
    PublicFacing  = "bao"
  }
}

data "aws_iam_policy_document" "policy" {
  # Allow root account to manage the key
  statement {
    sid       = "AllowRootAccess"
    actions   = ["kms:*"]
    resources = ["*"]
    principals {
      type        = "AWS"
      identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
    }
    condition {
      test     = "StringEquals"
      variable = "aws:principaltype"
      values   = ["Account"]
    }
  }
}

resource "aws_kms_key" "key" {
  description         = "Encryption key for Athena query results bucket ${local.bucket_name}"
  policy              = data.aws_iam_policy_document.policy.json
  is_enabled          = true
  enable_key_rotation = true

  tags = {
    Name          = local.bucket_name
    Creator       = "foo"
    Owner         = "bar"
    ContainingPII = "baz"
    PublicFacing  = "bao"
  }
}


Run with 0.24.0

1122 ~/temp % tflint             
2 issue(s) found:

Notice: The resource is missing the following tags: "ContainingPII", "Creator", "Name", "Owner", "PublicFacing". (aws_resource_missing_tags)

  on main.tf line 23:
  23:   tags = {
  24:     Name          = local.bucket_name
  25:     Creator       = "foo"
  26:     Owner         = "bar"
  27:     ContainingPII = "baz"
  28:     PublicFacing  = "bao"
  29:   }

Reference: https://github.com/terraform-linters/tflint-ruleset-aws/blob/v0.24.0/docs/rules/aws_resource_missing_tags.md

Notice: The resource is missing the following tags: "ContainingPII", "Creator", "Name", "Owner", "PublicFacing". (aws_resource_missing_tags)

  on main.tf line 56:
  56:   tags = {
  57:     Name          = local.bucket_name
  58:     Creator       = "foo"
  59:     Owner         = "bar"
  60:     ContainingPII = "baz"
  61:     PublicFacing  = "bao"
  62:   }

Reference: https://github.com/terraform-linters/tflint-ruleset-aws/blob/v0.24.0/docs/rules/aws_resource_missing_tags.md

1123 ~/temp % 

run with 0.23.0:

1124 ~/temp % tflint        
1125 ~/temp % tflint -v
TFLint version 0.47.0
+ ruleset.aws (0.23.0)
+ ruleset.terraform (0.4.0)
1126 ~/temp % 

.tflint conf

config {
  format              = "default"
  module              = true
  force               = false #false if you want to stop work
  disabled_by_default = false
}

plugin "aws" {
  deep_check = true
  enabled    = true
  version    = "0.24.0" #pinned due to https://github.com/terraform-linters/tflint-ruleset-aws/issues/510
  source     = "github.com/terraform-linters/tflint-ruleset-aws"
}

plugin "terraform" {
  preset  = "recommended"
  enabled = true
  version = "0.4.0"
  source  = "github.com/terraform-linters/tflint-ruleset-terraform"
}

rule "aws_resource_missing_tags" {
  enabled = true
  tags    = ["Name", "Creator", "Owner", "ContainingPII", "PublicFacing"]
  #exclude = ["aws_autoscaling_group"] # (Optional) Exclude some resource types from tag checks
}

There are multiple potential causes here.

Go ahead and remove the locals and see if it passes. If so, a minimal reproduction would be something like:

locals {
  k = "v"
}

resource "aws_s3_bucket" "bucket" {
  tags = {
    Local = local.k
  }
}

There's a good chance this is not the case though, and the root cause here is the data source, in which case a reproduction would be:

data "aws_region" "current" {}

resource "aws_s3_bucket" "bucket" {
  tags = {
    Region = data.aws_region.current.name
  }
}

In the former case, the local value must be resolved, but the value can still be known through static analysis.

In the latter, the expression's value cannot be known until plan time. The rule would have to specifically evaluate the map's keys only, ignoring the unknown map values. While this should be possible, it's tricky and so seems more likely to be the issue here.

@bendrucker looks like you are spot on. If I set a static value to the local.bucket_name it passes fine with 0.24.x.

Not sure what changed in the logic on just checking key names vs not between versions. I will work on fixing locally with a workaround, but it seems you cant use any rendered data source value in the tag value. Even directly like this:

  on main.tf line 13:
  13:   tags = {
  14:     Name          = data.aws_caller_identity.current.id
  15:     Creator       = var.creator_tag
  16:     Owner         = var.owner_tag
  17:     ContainingPII = var.containingpii_tag
  18:     PublicFacing  = var.publicfacing_tag
  19:   }

the above passes in 0.23.1 but not 0.24.1

Ok great, we can take it from there.

@JorgeReus if you're inclined to look at this, I suspect this is caused by a lack of typing when evaluating the tags expression.

Before:

wantType := cty.Map(cty.String)
err := runner.EvaluateExpr(attribute.Expr, func(resourceTags map[string]string) error {
r.emitIssue(runner, resourceTags, config, attribute.Expr.Range())
return nil
}, &tflint.EvaluateExprOption{WantType: &wantType})

After:

https://github.com/terraform-linters/tflint-ruleset-aws/blob/5020f0d07740c1206553a810fa294323d4e399d0/rules/aws_resource_missing_tags.go#L197C1-L197C1

Hard to understand why wanting cty.Map(cty.String) would change behavior since it's still not explicitly trying to evaluate just the keys. But if this worked before, it seems worth a shot.

Going to check in a few, seems like you are right on the money @bendrucker

The test suite passes, I wanted to add a unit test for this case, but I can't put data sources in the test contents.

@bendrucker Could you point me on how I can correctly generate the provider locally?
Last I tried it couldn't find the rules in the compiled plugin

Found the issue, TL;DR when dynamic sources are involved (data sources, resource attributes) the rule isn't evaluated at all.

When doing local test with CGO_ENABLED=0 go build && mv tflint-ruleset-aws ~/.tflint.d/plugins/tflint-ruleset-aws(oddly enough make install doesn't work for me) in tag 0.23.x it shows this debug log
[DEBUG] plugin2host/client.go:397: unknown value found in main.tf:4,10-7,4 since the r.emitIssue is inside the evaluateExpr the code is not executed. In tag 0.24.1 since the r.emitIssue is outside the evaluateExpr block it evaluates an empty set of tags.

It gets worse in the default_tags scenario since the rule is evaluated in a resource by resource basis, since the rule won't be ignored and everything will have a missing tag issue, as an example:
With static tags:

[DEBUG] rules/aws_resource_missing_tags.go:112: Walk `%s` provider with tags `%v`: default=map[Region:1]
[DEBUG] rules/aws_resource_missing_tags.go:207: Walk `%s` resource: EXTRA_VALUE_AT_END=aws_s3_bucket.bucket

With dynamic tags

[DEBUG] plugin2host/client.go:398: unknown value found in main.tf:9,12-12,6
[DEBUG] rules/aws_resource_missing_tags.go:207: Walk `%s` resource: EXTRA_VALUE_AT_END=aws_s3_bucket.bucket

main.tf

data "aws_region" "current" {}

resource "aws_s3_bucket" "bucket" {
}

provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = {
      Region = data.aws_region.current.name
      # Region = "1"
    }
  }
}

.tflint.hcl

plugin "aws" {
  enabled    = true
}

rule "aws_resource_missing_tags" {
  enabled = true
  tags    = ["Region"]
}

Created #511
But kinda worried on people running into this issue when using default_tags, @bendrucker let me know if we should consider other approaches or explicit documentation in https://github.com/terraform-linters/tflint-ruleset-aws/blob/master/docs/rules/aws_resource_missing_tags.md

can't put data sources in the test contents

This is not quite correct, you can declare data sources, but you will not be able to reference them. HCL is meant to be consumed through schemas. You don't dynamically decode, you tell the parser up-front about the structure of the document and have the option to either fully decode, erroring on any unknown content, or partially decode, ignoring any content not contained in the schema. TFLint does the latter, because it wants to enforce the things it knows about and ignore the things it doesn't.

In the past, TFLint directly depended on Terraform. Terraform which does full decoding, since you wouldn't want new features to silently fail if the config were used on an old Terraform version. That meant TFLint releases were tightly coupled to Terraform releases. Now, TFLint does its own (partial) module decoding. In turn, this means that features that Terraform supports have to be reverse-engineered from their low-level schema.

Data sources aren't included at all in the current parser:

https://github.com/terraform-linters/tflint/blob/0cbeee60b1a82680eff0c15580d37122b7bbf7cd/terraform/module.go#L13-L17

https://github.com/terraform-linters/tflint/blob/0cbeee60b1a82680eff0c15580d37122b7bbf7cd/terraform/module.go#L176-L197

But that's probably an oversight and should be changed, more to allow rules to inspect data sources than to address a case like this.

As you've noted, data sources aren't essential to reproducing this issue, as the difference between a data source, known at plan time, and a resource, known at apply time, is irrelevant here. They both are not statically evaluable and therefore "unknown."

when dynamic sources are involved (data sources, resource attributes) the rule isn't evaluated at all.

This is not literally true but I'll explain what you're getting at here. In a literal sense, the rule is still evaluated/executed. It just has to choose how to handle unknown values (resource and data source references).

This behavior of EvaluateExpr is well-documented:

https://github.com/terraform-linters/tflint-plugin-sdk/blob/d10b04fbb82b87caaf87704cd6f0a4f0220f2a62/tflint/interface.go#L163-L226

Particularly:

For functions (callbacks), the assigned value is used as an argument to execute the function. If a value cannot be assigned to the argument type, the execution is skipped instead of returning an error. This is useful when it's always acceptable to ignore exceptional values.

By passing a callback, you are telling the plugin SDK to ignore unknown value errors and only call your function when the expression can successfully be decoded into your function's argument type.

So it's a bug to assume that this callback will be called. If you must evaluate the expression for any following logic to proceed, you generally shouldn't use callbacks. In general, it's best to return early if an error exists since you can't proceed with unknown values.

Repeating for clarity based on the notes above. This rule was never capable of checking these tags. It didn't matter if tags matched your rule or didn't.

The regression here is that the unknown expression, a map with some unknown values, is mishandled. Previously, the error was properly handled and ignored.

Evaluating only the keys would be a new feature, likely requiring SDK-level enhancements. Terraform has to do something similar for for_each, which supports objects with known keys but unknown values. So in theory this should be possible.

Thank you very much for the clarification!

Would be awesome to support the checking of keys in the future, will dive deep into the core terraform code for it.

So, regarding the default tags, would be difficult to know whether the tags are ignored because of the uknown values or that it's and actual violation of the rule. So is there anything you would recommend or we we can proceed with the PR?

I'll have a look at the PR, but a note on your example, this shouldn't work literally:

data "aws_region" "current" {}

resource "aws_s3_bucket" "bucket" {
}

provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = {
      Region = data.aws_region.current.name
    }
  }
}

This effectively creates a circular reference, where the provider is configured with its own data source.

But if you used multiple aliased providers or another provider (e.g., the built-in terraform provider's remote_state data source), it would be valid. Terraform will tolerate dynamically configured providers as long as there's no cycle.

So for now we should just make sure that unknown values are ignored. A false positive is a bug, but some false negatives are inherent to static analysis. Static analysis can almost always be trivially defeated using valid configuration.

data "terraform_remote_state" "aws" {
  # ...
}

provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = data.terraform_remote_state.aws.outputs.default_tags
  }
}

TFLint is a developer productivity tool meant to catch errors early. Tools like TFLint and tfsec should not be used to enforce any critical security or organizational policy. That is the domain of tools like Sentinel and Open Policy Agent that are designed to evaluate the data (JSON) representation of a Terraform plan, as opposed to the input (HCL) configuration. We can/should try to incrementally detect more issues through static configuration analysis to maximize the early feedback developers can get. But past the limits of static analysis, we have to silently ignore by default. Users may not intuitively understand but there's not much we can do. Printing on every ignored expression would become unusably verbose in many real-world configurations.