GoogleCloudPlatform/cloud-foundation-fabric

Modification of Google-managed domains using net-lb-app-ext is currently not possible without workarounds

nika-pr opened this issue · 5 comments

Describe the bug
When using the net-lb-app-ext module, modifying the list of domains currently does not work without an intermediate apply of removing HTTPS+ssl_certificates properties.

Environment

Terraform v1.5.7
on windows_amd64
+ provider registry.terraform.io/hashicorp/google v5.21.0
+ provider registry.terraform.io/hashicorp/google-beta v5.21.0

To Reproduce

I am using the net-lb-app-ext module. The relevant Terraform lines are the following:

module "lp_gl_ext_app_lb" {
  source              = "github.com/GoogleCloudPlatform/cloud-foundation-fabric//modules/net-lb-app-ext?ref=v29.0.0"
  project_id          = var.project_id
  name                = var.lb_name
  use_classic_version = false
  protocol = "HTTPS"
  ssl_certificates = {
    managed_configs = {
      default = {
        domains = var.domains
      }
    }
  }

Modifying the value of var.domains yields the mentioned error.

Expected behavior

  1. A new google_compute_managed_ssl_certificate resource with a different name should be created.
  2. The load balancer frontend should be modified to use the new resource.
  3. The old certificate resource should be destroyed.

Result
Plan:

 # module.lp_gl_ext_app_lb.google_compute_managed_ssl_certificate.default["default"] must be replaced
+/- resource "google_compute_managed_ssl_certificate" "default" {
      ~ certificate_id            = <REDACTED>-> (known after apply)
      ~ creation_timestamp        = "2024-03-28T05:10:00.312-07:00" -> (known after apply)
      + expire_time               = (known after apply)
      ~ id                        = "projects/<PROJECT_ID>/global/sslCertificates/<LB_NAME>-1-default" -> (known after apply)
        name                      = "tf-lb-dev-se-landing-pages-1-default"
      ~ self_link                 = "https://www.googleapis.com/compute/v1/projects/<PROJECT_ID>/global/sslCertificates/<LB_NAME>-default" -> (known after apply)
      ~ subject_alternative_names = [] -> (known after apply)
        # (2 unchanged attributes hidden)

      ~ managed {
          ~ domains = [ # forces replacement
                "<DOMAIN_1>",
              - "<DOMAIN_2>",
              + "<DOMAIN_3>",
            ]
        }
    }

  # module.lp_gl_ext_app_lb.google_compute_target_https_proxy.default[0] will be updated in-place
  ~ resource "google_compute_target_https_proxy" "default" {
        id                               = "projects/<PROJECT_ID>/global/targetHttpsProxies/<LB_NAME>"
        name                             = "<LB_NAME>"
      ~ ssl_certificates                 = [
          - "https://www.googleapis.com/compute/v1/projects/<PROJECT_ID>/global/sslCertificates/<LB_NAME>-default",
        ] -> (known after apply)
        # (10 unchanged attributes hidden)
    }

Plan: 1 to add, 1 to change, 1 to destroy.

Error during apply:

Error creating ManagedSslCertificate: googleapi: Error 409: The resource 'projects/<PROJECT_ID>/global/sslCertificates/<LB_NAME>-default' already exists, alreadyExists
│
│   with module.lp_gl_ext_app_lb.google_compute_managed_ssl_certificate.default["default"],
│   on .terraform\modules\lp_gl_ext_app_lb\modules\net-lb-app-ext\main.tf line 61, in resource "google_compute_managed_ssl_certificate" "default":
│   61: resource "google_compute_managed_ssl_certificate" "default" {

Obviously creating a new certificate with the same name as the existing one will not work. This should be changed.

Thank you very much for the report.

This is considered missing documentation issue and is tracked in #2093 (see also discussion in #2080)

With certificates the situation is a bit harder, because we need to recreate the certificate resource - and we cannot create a new one, because old one already exists.

What you need to do here is, given the starting point of:

  ssl_certificates = {
    managed_configs = {
      default = {
        domains = var.domains
      }
    }
  }

Change that to:

  ssl_certificates = {
   managed_configs = {
     default_v2 = {
       domains = var.domains
     }
   }
 }

So you will request creation of a new certificate by terraform before removing the old one. Though this will introduce unavailability of the service, as the new certificate will not be provisioned right away.

From this perspective, it may be the best, to create an intermediary state:

  ssl_certificates = {
    managed_configs = {
      default = {
        domains = var.old_domains
      }
      default-v2 = {
        domains = var.new_domains
      }

    }
  }

Which will provision and add new certificate, and then - once new one is provisioned and confirmed to be working, remove the old one.

Thank you for the prompt response!

I'll try the workaround. I would, however, like to find a way so that I don't have to manually update keys if the variable values change. What's the plan in this scenario?

I was thinking of creating a pull request that adds a random suffix to the name of the certificate with a recreate trigger depending on the domains. Would that pull request have a chance of getting merged? I'm afraid it would introduce a breaking change to the module 🤔

I was thinking of creating a pull request that adds a random suffix to the name of the certificate with a recreate trigger depending on the domains. Would that pull request have a chance of getting merged? I'm afraid it would introduce a breaking change to the module 🤔

Generally backwards incompatible changes are allowed and I think, you can even aim for backward compatibility with this PR (though deployment will still be disruptive to the service).

You can check the code in the #2080 PR. Using terraform_data and random may result in the solution that you're expecting, though this will allow for an easy foot-shooting when handling production environments and introducing disruption to the services. That's why we opted earlier to document better these scenarios and how this should be executed without introducing disruption to the service, instead of going forward with the approach outlined in #2080 .

Closing this as Wiktor outlined our preferred solution. Feel free to reopen if you feel we should continue discussing this, and thanks for bringing this to our attention!