minamijoyo/tfmigrate

Bug: Error when Terraform Cloud is Remote Backend

GoodmanBen opened this issue · 6 comments

Current State:
Terraform Cloud is an enterprise cloud solution from Hashicorp that includes a remote state backend. When initializing terraform, the -reconfigure flag does not work when the remote backend is specified to be Terraform Cloud. This error occurs here, and leads to tfmigrate not working with Terraform Cloud as a backend.

image

Solution Plan:
When Terraform Cloud is the desired backend, running the initialization command without the -reconfigure flag leads to tfmigrate running exactly as desired. If the user specifies the new parameter "is_backend_terraform_cloud" to be true in their configuration file, then this is the init command that will be run in place of the c.Init(ctx, "-input=false", "-no-color", "-reconfigure") command that is run currently.

Assigning myself to work on this issue.

Hi @GoodmanBen, Thank you for reporting this.

There are two types of backend implementations for the Terraform Cloud.
(1) What terraform version are you using?
(2) Which backend are you using remote or cloud introduced in Terraform 1.1+?
https://www.terraform.io/language/settings/backends/remote
https://www.terraform.io/language/settings/terraform-cloud

Currently using terraform 1.1.7 and using the backend syntax of cloud. Perhaps a better naming convention for a new parameter would be something like is_backend_cloud as opposed to is_backend_terraform_cloud? Also open to other configuration naming suggestions.

Hi @GoodmanBen, I was curious where the error comes from and read code in the Terraform core and realized that this issue wasn't so simple as expected.

The error comes from here:
https://github.com/hashicorp/terraform/blob/v1.1.7/internal/command/meta_backend.go#L1412

If my understanding is correct, this issue only affects the cloud block introduced in Terraform 1.1+ due to explicit validations of init options to prevent a misuse of the cloud block:
hashicorp/terraform#29940

The problem is the cloud block is defined outside of the backend block. I mean, we cannot override the cloud block to backend "local" block with an override file.
https://www.terraform.io/language/files/override#merging-terraform-blocks
https://github.com/minamijoyo/tfmigrate/blob/v0.3.1/tfexec/terraform.go#L205-L208

The document also says:

https://www.terraform.io/language/settings/terraform-cloud

You cannot use the CLI integration and a state backend in the same configuration; they are mutually exclusive.

This means that if you see it looks work for you by just removing the -reconfigure option from terraform init in switchBackToRemotekFunc(), it depends on unsupported behavior which uses both the cloud block and the backend "local" block in the same configuration on override. It seems that more research is needed on how to fix it.

Hi @minamijoyo thanks for such a detailed write-up, this is increasing my understanding of both tfmigrate and Terraform.

I agree with the you in that there is more going on here. What appears to be happening is this:

The override block is created within tfmigrate, when the command terraform init -input=false -no-color -reconfigure is run the backend is able to switch to the local backend successfully. This is occurring despite what is documented by Terraform in the merging-blocks and terraform-cloud docs. This leads me to think there is either an issue in Terraform functionality, or in their documentation, either way something is amiss. The following were tests run to reach this conclusion.

Test number one:
I've been able to replicate this behavior of successfully in the terraform CLI with an override.tf file of:

terraform {
  backend "local" {
  }
}

and a main.tf file containing:

terraform {
  required_version = "~> 1.1.7"

  required_providers {
    // providers
  }

  cloud {
    organization = "my-org"

    workspaces {
      name = "my-workspace-dev"
    }
  }
}

Test number two:
Running terraform init with only main.tf and the specification as follows:

terraform {
  backend "local" {
  }
  required_version = "~> 1.1.7"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~>4.12.0"
    }
  }

  cloud {
    organization = "my-org"

    workspaces {
      name = "my-workspace-dev"
    }
  }
}

This results in the anticipated error of
image. This leads to suspicion that either override functionality is either mis-documented or buggy.

Test number three:
main.tf:

terraform {
  backend "local" {
  }
  required_version = "~> 1.1.7"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~>4.12.0"
    }
  }
}

override.tf:

terraform {
  cloud {
    organization = "my-workspace-cloud"

    workspaces {
      name = "my-workspace-dev"
    }
  }
}

When running terraform init this combination results in a terraform cloud backend being initialized.

As a result, I suspect that the issue lies with Terraform's documentation not being up to date with the implemented behavior for overriding the backend block with the cloud block and vice a versa. Looking into the Terraform source code now, I suspect the documentation is out of sync.

The observed behavior in the originally identified solutions does indeed look to be exactly as is specifically intended in the code base:
https://github.com/hashicorp/terraform/blob/main/internal/configs/module.go#L379

When a cloud block or backend block exists in the override.tf file, the mutually exclusive other backend or cloud block, respectively, is made nil and the backend is taken to be whichever backend is specified in the override.tf file.

I'm going to put in a PR to update Terraform's terraform block overriding documentation. But in summation, @minamijoyo it looks like the identified solution is supported, just simply not well documented (yet 😄 ).

Thank you for the investigation. The behavior looks intentional 😉