GoogleCloudPlatform/cloud-foundation-fabric

phpipam set to "INTERNAL" for load balancing crashes the terraform creation

PapaPeskwo opened this issue · 7 comments

Describe the bug
When trying to deploy the phpipam blueprint with "internal" set, i get:

╷
│ Error: Error waiting for Create Service Networking Connection: Error code 9, message: Cannot modify allocated ranges in CreateConnection. Please use UpdateConnection.
│ Help Token: ARqICRNc_ZjutDQEegxZ7rmTl7EVY2xyn7EAOHEYAcrhELm4NJQ366zGs9eWeL4z87LeWvGG4OJODdjxclpnEAIYno1KwVP-VMVCBC6jZiTA996s
│ 
│   with module.vpc[0].google_service_networking_connection.psa_connection["servicenetworking.googleapis.com"],
│   on ../../../modules/net-vpc/psa.tf line 61, in resource "google_service_networking_connection" "psa_connection":
│   61: resource "google_service_networking_connection" "psa_connection" {
│ 
╵

Nothing is inside the project.
This is the terraform.tfvars:

#TESTING
prefix     = "test"
project_id = "<project>"

region           = "europe-north1"
phpipam_exposure = "INTERNAL"

I tried chaing the IP addresses in ilb.tf:

  default = {
    connector = "10.8.7.0/28"
    psa       = "10.60.7.0/24"
    ilb       = "10.128.7.0/28"
  }

but that did not help.

Environment
I tried with two environments:

Terraform v1.7.5
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v5.25.0
+ provider registry.terraform.io/hashicorp/google-beta v5.25.0
+ provider registry.terraform.io/hashicorp/random v3.6.1
+ provider registry.terraform.io/hashicorp/tls v4.0.5

Your version of Terraform is out of date! The latest version
is 1.8.0. You can update by downloading from https://www.terraform.io/downloads.html
Terraform v1.8.0
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v5.25.0
+ provider registry.terraform.io/hashicorp/google-beta v5.25.0
+ provider registry.terraform.io/hashicorp/random v3.6.1
+ provider registry.terraform.io/hashicorp/tls v4.0.5

Your version of Terraform is out of date! The latest version
is 1.8.1. You can update by downloading from https://www.terraform.io/downloads.html
git rev-parse --short HEAD
f22837cd

To Reproduce
Set the phpipam_exposure = "INTERNAL" in your tfvars file.

Expected behavior
Everything to be set up without errors.

Result

╷
│ Error: Error waiting for Create Service Networking Connection: Error code 9, message: Cannot modify allocated ranges in CreateConnection. Please use UpdateConnection.
│ Help Token: ARqICRNc_ZjutDQEegxZ7rmTl7EVY2xyn7EAOHEYAcrhELm4NJQ366zGs9eWeL4z87LeWvGG4OJODdjxclpnEAIYno1KwVP-VMVCBC6jZiTA996s
│ 
│   with module.vpc[0].google_service_networking_connection.psa_connection["servicenetworking.googleapis.com"],
│   on ../../../modules/net-vpc/psa.tf line 61, in resource "google_service_networking_connection" "psa_connection":
│   61: resource "google_service_networking_connection" "psa_connection" {
│ 
╵

Additional context
Add any other context about the problem here

I manually tried to create a Private Service Connect endpoint, to no avail.

@simonebruzzechesse can you TAL?

Hi @PapaPeskwo ! Thanks for reaching out, I just reviewed the code and I noticed a couple of issues with the internal exposure of the application. One was a regression due to a module upgrade while the other was probably a missing proxy subnet in the VPC, you can find the latest version in my PR #2226

During my test I didn't experience the issue you are reporting. I did some research on the error reported and it might be related to the fact that you are trying to setup a private service connection in a VPC where there is already an existing connection (not sure if that is the case). Could you please try to setup the phpIPAM service in a brand new vpc or project and let me know if the version of the code in the PR works properly?

Hello @simonebruzzechesse
I remember getting different error codes a month back when I tried deploying with the "INTERNAL" variable set, most likely what you're describing. Unfortunately I can't find my notes from then. I assume it's the same error as you experienced.
Regarding the connection, I can't find anything left over after destroying everything, I'm not sure if I'm looking at the wrong place or what the issue is. I'll definitely try to set everything up again with the fixes you've made. I'll get back to you afterwards.

Also, thanks for the quick fix :)

Hi @PapaPeskwo , yes I'm quite sure you spotted the errors I should have fixed today. The connection error is strange, I still would recommend to setup everything again from scratch and double check if you still experience the same issue. Then let me know! Thanks to you for opening the issue and letting us know of those errors!

Hi again @simonebruzzechesse
I tried deploying to a different project and It worked!

I was also successful in using this command to fix the project that had issues:

gcloud beta services vpc-peerings update \
	--service=servicenetworking.googleapis.com \
	--ranges=[your-private-connection-range-name] \
	--network=[your-vpc-name] \
	--project=[your-project-id] \
	--force

This issue was discussed here and here. Perhaps something that can be added to the README? (I can open a PR with that added).

Another note; when destroying, I get:

│ **Error:** **Unable to remove Service Networking Connection, err: Error waiting for Delete Service Networking Connection: Error code 9, message: Failed to delete connection; Producer services (e.g. CloudSQL, Cloud Memstore, etc.) are still using this connection.**

Manually deleting the VPC network peering works. Is this something that can be solved in the Terraform configuration?

Either way it works now and I'm very grateful.
Thank you so much :)

Hi @PapaPeskwo, thanks for all the info shared and happy the newer version worked properly. That really looks like some kind of issues with PSA so I don't think we can manage this in a different way via Terraform, ofc feel free to add a dedicated section on the README.md warning users of such a possible issue when deleting and re-creating the blueprint. That would be appreciated.

I removed also the ingress setting "all" which was causing the Cloud Run service still being accessible via default URL when being deployed privately.

Regarding the latest issue on deprovisioning resources we were able to identity (and solve) the issue. We were missing the deletion_policy parameter in the google_service_networking_connection terraform resource. Now everything should work properly, please let me know if you manage to give it a try and check everything is fine now. I just closed the PR so you can reference the main branch now. Thanks for sharing and hope you won't experience any further issue in your next deployment :)