terraform/metadata empty after repeated builds

Question

terraform/metadata empty after repeated builds

kgugle opened this issue 4 years ago · 5 comments

Hey!

I have a pipeline that:

Builds an EC2 instance A
Provides the public IP of A to some internal resource B

When I run a build the terraform/metadata file is correct and contains the output variables I've specified in my terraform config. However, when I run the same build again, the file is empty as shown below. I'd like for the second build to also include all the output variables (similar to terraform cli) regardless of whether the infrastructure was updated or not.

// Build 1:
name: <secret-name>
metadata: {"instance_state":"running","public_dns":"ec2-xx-xx-xx-xx.us-west-2.compute.amazonaws.com","public_ip":"xx.xxx.xx.xx"}

// Build 2 (no changes to infrastructure, just rebuilding):
name: <secret-name>
metadata: {}

Is the intended behavior? And if so, any good ways you know to get around this?

I was thinking I could run a terraform output after or directly grab the new .tfstate file from my S3 backend.

Answer 1 · 2020-11-03T11:59:42.000Z

@kgugle the outputs should be present regardless of whether there's a diff or not. Could you share the relevant snippets of your pipeline config? Specifically I'm curious if this is one Concourse job that does a put to Terraform followed by another task or multiple Concourse jobs where the second job does a get on the Terraform resource.

Answer 2 · 2020-11-13T20:47:08.000Z

I have the same issue.

Here's an abridged pipeline config:

 - name: terraform-plan-apply-dev
    serial: true
    serial_groups: [ dev-deploy ]
    on_failure: ((slack_notifications.put_failure_notification))
    on_error: ((slack_notifications.put_failure_notification))
    plan: ...
      - put: terraform
        params:
            plan_only: true  
      - put: terraform

  - name: deploy-new-image-to-dev
    serial: true
    serial_groups: [ dev-deploy ]
    on_failure: ((slack_notifications.put_failure_notification))
    on_error: ((slack_notifications.put_failure_notification))
    plan:
      - get: terraform
      - task: ecs-update-service-if-needed # A task that references the terraform/metadata file
        config: *deploy_to_ecs_task
        params:
          CLUSTER: ...

Answer 3 · 2020-11-14T14:35:53.000Z

Try adding - {get: terraform, passed: [terraform-plan-apply-dev]} and see if that avoids the issue. There might be an bug with our check implementation but trying out that workaround would help with getting to the root cause.

Answer 4 · 2021-01-29T17:36:23.000Z

Same issue here:
Sometimes a job in Concourse only sees the second to last version of the Terraform resource. It is "properly" reported in the Concourse UI, ie. it read plan_only: true as the job input, even though there exists a newer version.

We {get: terraform, passed: [terraform-plan-apply-dev]} but that didn't help.
This is probably because the 'plan-only' version of the state was also a valid output of the previous job (terraform-plan-apply-dev in this example)

It would be great if the get operation would at least allow to filter for plan_only: false as at least for us the output changes barely so that this issue would go away

Answer 5 · 2021-01-31T20:46:58.000Z

I just pushed this change to the ljfranklin/terraform-resource:latest and ljfranklin/terraform-resource:0.14.5 images which maybe addresses this issue. Seems like what's happening is there's a potential ordering issue when a single job does multiple put steps to the resource where the first version produced will be detected as "latest" rather than the second. The potential workaround I pushed tries to sidestep the issue by ensuring the metadata file get populated even if the first plan version gets pulled.

Another possible workaround is to change:

    plan: ...
      - put: terraform
        params:
            plan_only: true  
      - put: terraform

to:

    plan: ...
      - put: terraform

Since it doesn't seem like that initial plan step is buying you much anyway. Normally folks use a plan step to run against a PR branch without actually changing anything. Or they run a plan job followed by a manually triggered apply job. That way a human can review the plan output prior to triggering the apply job.

Let me know if you're still seeing the issue after pulling the latest image in case I misunderstood the bug.