Bug: Error when Terraform Cloud is Remote Backend
GoodmanBen opened this issue · 6 comments
Current State:
Terraform Cloud is an enterprise cloud solution from Hashicorp that includes a remote state backend. When initializing terraform, the -reconfigure
flag does not work when the remote backend is specified to be Terraform Cloud. This error occurs here, and leads to tfmigrate not working with Terraform Cloud as a backend.
Solution Plan:
When Terraform Cloud is the desired backend, running the initialization command without the -reconfigure
flag leads to tfmigrate running exactly as desired. If the user specifies the new parameter "is_backend_terraform_cloud" to be true in their configuration file, then this is the init
command that will be run in place of the c.Init(ctx, "-input=false", "-no-color", "-reconfigure")
command that is run currently.
Assigning myself to work on this issue.
Hi @GoodmanBen, Thank you for reporting this.
There are two types of backend implementations for the Terraform Cloud.
(1) What terraform version are you using?
(2) Which backend are you using remote
or cloud
introduced in Terraform 1.1+?
https://www.terraform.io/language/settings/backends/remote
https://www.terraform.io/language/settings/terraform-cloud
Currently using terraform 1.1.7 and using the backend syntax of cloud
. Perhaps a better naming convention for a new parameter would be something like is_backend_cloud
as opposed to is_backend_terraform_cloud
? Also open to other configuration naming suggestions.
Hi @GoodmanBen, I was curious where the error comes from and read code in the Terraform core and realized that this issue wasn't so simple as expected.
The error comes from here:
https://github.com/hashicorp/terraform/blob/v1.1.7/internal/command/meta_backend.go#L1412
If my understanding is correct, this issue only affects the cloud
block introduced in Terraform 1.1+ due to explicit validations of init options to prevent a misuse of the cloud
block:
hashicorp/terraform#29940
The problem is the cloud
block is defined outside of the backend
block. I mean, we cannot override the cloud
block to backend "local"
block with an override file.
https://www.terraform.io/language/files/override#merging-terraform-blocks
https://github.com/minamijoyo/tfmigrate/blob/v0.3.1/tfexec/terraform.go#L205-L208
The document also says:
https://www.terraform.io/language/settings/terraform-cloud
You cannot use the CLI integration and a state backend in the same configuration; they are mutually exclusive.
This means that if you see it looks work for you by just removing the -reconfigure
option from terraform init in switchBackToRemotekFunc()
, it depends on unsupported behavior which uses both the cloud
block and the backend "local"
block in the same configuration on override. It seems that more research is needed on how to fix it.
Hi @minamijoyo thanks for such a detailed write-up, this is increasing my understanding of both tfmigrate
and Terraform.
I agree with the you in that there is more going on here. What appears to be happening is this:
The override block is created within tfmigrate
, when the command terraform init -input=false -no-color -reconfigure
is run the backend is able to switch to the local backend successfully. This is occurring despite what is documented by Terraform in the merging-blocks and terraform-cloud docs. This leads me to think there is either an issue in Terraform functionality, or in their documentation, either way something is amiss. The following were tests run to reach this conclusion.
Test number one:
I've been able to replicate this behavior of successfully in the terraform CLI with an override.tf
file of:
terraform {
backend "local" {
}
}
and a main.tf file containing:
terraform {
required_version = "~> 1.1.7"
required_providers {
// providers
}
cloud {
organization = "my-org"
workspaces {
name = "my-workspace-dev"
}
}
}
Test number two:
Running terraform init
with only main.tf
and the specification as follows:
terraform {
backend "local" {
}
required_version = "~> 1.1.7"
required_providers {
google = {
source = "hashicorp/google"
version = "~>4.12.0"
}
}
cloud {
organization = "my-org"
workspaces {
name = "my-workspace-dev"
}
}
}
This results in the anticipated error of
. This leads to suspicion that either override functionality is either mis-documented or buggy.
Test number three:
main.tf:
terraform {
backend "local" {
}
required_version = "~> 1.1.7"
required_providers {
google = {
source = "hashicorp/google"
version = "~>4.12.0"
}
}
}
override.tf:
terraform {
cloud {
organization = "my-workspace-cloud"
workspaces {
name = "my-workspace-dev"
}
}
}
When running terraform init
this combination results in a terraform cloud backend being initialized.
As a result, I suspect that the issue lies with Terraform's documentation not being up to date with the implemented behavior for overriding the backend
block with the cloud
block and vice a versa. Looking into the Terraform source code now, I suspect the documentation is out of sync.
The observed behavior in the originally identified solutions does indeed look to be exactly as is specifically intended in the code base:
https://github.com/hashicorp/terraform/blob/main/internal/configs/module.go#L379
When a cloud
block or backend
block exists in the override.tf
file, the mutually exclusive other backend
or cloud
block, respectively, is made nil and the backend is taken to be whichever backend is specified in the override.tf
file.
I'm going to put in a PR to update Terraform's terraform block overriding documentation. But in summation, @minamijoyo it looks like the identified solution is supported, just simply not well documented (yet 😄 ).
Thank you for the investigation. The behavior looks intentional 😉