kubevela/terraform-controller

Kubevela Terraform application deletion is now failing to delete the underlying resources

suramasamy opened this issue · 9 comments

Kubevela Terraform application deletion is now failing to delete the underlying resources, seems that the terraform destroy execution is getting skipped.
We see the following in the logs,
I0822 06:58:26.487899 1 configuration_controller.go:175] "performing Configuration Destroy" Namespace="test" Name="bin-repo-app" JobName="95eda79d-7f06-4b9a-8a90-4634ef574905-destroy" I0822 06:58:26.487911 1 status.go:16] "checking Terraform init and execution status" Namespace="vela-system" Job="95eda79d-7f06-4b9a-8a90-4634ef574905-destroy" I0822 06:58:26.497866 1 configuration_controller.go:195] No need to execute terraform destroy command, because tfstate file not found: test/bin-repo-app

It looks like the issue is because of this commit - d2e6f21
The actual secret where the state is stored is present in the vela-system namespace but the code seems to be checking in the namespace where the Kubevela application is created.
Could you please advise?

Can you please show the application/configuration you are using?

Please refer attached test component definition and application yaml files,
sample_comp.txt
sample_app.txt

Hi @chivalryq I am on same team as @suramasamy. It looks like the tfState.Outputs is empty in our tfstate-default-uid secret because of which this condition is evaluating to false and terraform destroy is skipped. Also want to mention that our tfState has resources, it looks like following:

{
  "version": 4,
  "terraform_version": "1.1.9",
  "serial": 4,
  "lineage": "90bc822c-6da1-e2f8-db84-f6d9d3f33a0a",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "alicloud_oss_bucket",
      "name": "bucket-acl",
      "provider": "provider[\"registry.terraform.io/hashicorp/alicloud\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "acl": "private",
            "bucket": "restore-example",
            "cors_rule": [],
            "creation_date": "2022-05-25",
            "extranet_endpoint": "oss-cn-beijing.aliyuncs.com",
            "force_destroy": false,
            "id": "restore-example",
            "intranet_endpoint": "oss-cn-beijing-internal.aliyuncs.com",
            "lifecycle_rule": [],
            "location": "oss-cn-beijing",
            "logging": [],
            "logging_isenable": null,
            "owner": "1170540042370241",
            "policy": "",
            "redundancy_type": "LRS",
            "referer_config": [],
            "server_side_encryption_rule": [],
            "storage_class": "Standard",
            "tags": {},
            "transfer_acceleration": [],
            "versioning": [],
            "website": []
          },
          "sensitive_attributes": [],
          "private": "bnVsbA=="
        }
      ]
    }
  ]
}

Could you please advice how and where (in source-code) tfstate-default-uid is created and updated? I was not able to finding it in configuration_controller.go.

@jaswalkiranavtar Hi, this state file is not generated by terraform-controller. It is generated by terraform CLI when we specify kubernetes as backend of a configuration. Check the Job execution process.

Thanks for the info @chivalryq. I understood that statefile will be generated as part of "terraform apply" command inside Job container, my question however is, how is the statefile converted to a secret?

@chivalryq did a quick test by running "terraform apply" in oamdev/docker-terraform:1.1.5 docker image in local. The outputs in terraform.tfstate is empty because we don't have any outputs in our ComponentDefinition, as you can also see in the sample shared by @suramasamy above.

But the logic in isTFStateGenerated treats missing outputs as TFState not generated. Isn't that incorrect? It is perfectly normal for a terraform script to not have any outputs. Please let us know if you agree.

Yes, before we don't have cases that a terraform script don't have any outputs. So we ignore this case. I agree with you.

@chivalryq We completed the testing today by adding an output to ComponentDefinition as a workflow and terraform-destroy is working as expected now.

Created this PR to remove the condition that checks for outputs.

@jaswalkiranavtar Thanks for contribution! I commented on PR. PTAL