hashicorp/terraform

'google_compute_instance' adding 'google_compute_disk' forces new resource

advdv opened this issue · 14 comments

advdv commented

Hey, i've encountered unexpected behaviour while updating a google_compute_instance resource by adding a new disk {}.

Terraform Version

Terraform v0.6.16

Affected Resource(s)

  • google_compute_instance
  • google_compute_disk

Terraform Configuration Files

Consider starting out with the following configuration:

variable "gce_ssh_user" {}
variable "gce_ssh_pub_key_file" {}
variable "gce_service_account" {}
variable "gce_project_id" {}
variable "gce_image" {}
variable "gce_region" {
  default = "europe-west1"
}

variable "gce_network" {}
variable "gce_subnetwork" {}

provider "google" {
  credentials = "${file(var.gce_service_account)}"
  project     = "${var.gce_project_id}"
  region      = "${var.gce_region}"
}

resource "google_compute_disk" "default" {
  name  = "test-disk"
  type  = "pd-ssd"
  zone = "${var.gce_region}-b"
  size = 10
}

resource "google_compute_instance" "remove-me" {
  name       = "disk-test"
  machine_type         = "n1-standard-1"
  can_ip_forward       = false
  zone = "${var.gce_region}-b"

  disk {
    image = "ubuntu-1604-xenial-v20160429"
  }

  network_interface {
    subnetwork = "${var.gce_subnetwork}"
    access_config {
      // Ephemeral IP
    }
  }

  service_account {
    scopes = ["compute-rw"] 
  }

  metadata {
    sshKeys = "${var.gce_ssh_user}:${file(var.gce_ssh_pub_key_file)}"
  }
}

When using the gce interface I would now be able to add the disk 'test-disk' to the instance 'remove-me' without recreating it (or even rebooting). I would expect terraform to behave the same when changing the resource configuration to the following:

variable "gce_ssh_user" {}
variable "gce_ssh_pub_key_file" {}
variable "gce_service_account" {}
variable "gce_project_id" {}
variable "gce_image" {}
variable "gce_region" {
  default = "europe-west1"
}

variable "gce_network" {}
variable "gce_subnetwork" {}

provider "google" {
  credentials = "${file(var.gce_service_account)}"
  project     = "${var.gce_project_id}"
  region      = "${var.gce_region}"
}

resource "google_compute_disk" "default" {
  name  = "test-disk"
  type  = "pd-ssd"
  zone = "${var.gce_region}-b"
  size = 10
}

resource "google_compute_instance" "remove-me" {
  name       = "disk-test"
  machine_type         = "n1-standard-1"
  can_ip_forward       = false
  zone = "${var.gce_region}-b"

  disk {
    image = "ubuntu-1604-xenial-v20160429"
  }

  #DISK ADDED HERE (using .name attribute instead of self_link yields the same result)
  disk {
    disk = "${google_compute_disk.default.self_link}"
  }

  network_interface {
    subnetwork = "${var.gce_subnetwork}"
    access_config {
      // Ephemeral IP
    }
  }

  service_account {
    scopes = ["compute-rw"] //this grands our filesystemd to modify itself
  }

  metadata {
    sshKeys = "${var.gce_ssh_user}:${file(var.gce_ssh_pub_key_file)}"
  }
}

Debug Output

https://gist.github.com/advanderveer/fc9e8ef73f1f0b0b8578fedf9140307e

Panic Output

No panic is produced

Expected Behavior

I would expect the the 'google_compute_instance' not to be recreated

Actual Behavior

It get destroyed, i.e terraform plan shows multiple "forces new resource":

-/+ google_compute_instance.remove-me
    can_ip_forward:                                      "false" => "false"
    disk.#:                                              "1" => "2" **(forces new resource)**
    disk.0.auto_delete:                                  "true" => "true"
    disk.0.image:                                        "ubuntu-1604-xenial-v20160429" => "ubuntu-1604-xenial-v20160429"
    disk.1.auto_delete:                                  "" => "true" **(forces new resource)**
    disk.1.disk:                                         "" => "https://www.googleapis.com/compute/v1/projects/microfactory-test/zones/europe-west1-b/disks/test-disk" (forces new resource)
    machine_type:                                        "n1-standard-1" => "n1-standard-1"
    metadata.#:                                          "1" => "1"
    metadata.sshKeys:                                    "<my public ssh key>"
    metadata_fingerprint:                                "Mq9VPCUvh-E=" => "<computed>"
    name:                                                "disk-test" => "disk-test"
    network_interface.#:                                 "1" => "1"
    network_interface.0.access_config.#:                 "1" => "1"
    network_interface.0.access_config.0.assigned_nat_ip: "146.148.15.98" => "<computed>"
    network_interface.0.address:                         "10.0.0.3" => "<computed>"
    network_interface.0.name:                            "nic0" => "<computed>"
    network_interface.0.subnetwork:                      "microfactory-subnetwork" => "microfactory-subnetwork"
    self_link:                                           "https://www.googleapis.com/compute/v1/projects/microfactory-test/zones/europe-west1-b/instances/disk-test" => "<computed>"
    service_account.#:                                   "1" => "1"
    service_account.0.email:                             "1010578936991-compute@developer.gserviceaccount.com" => "<computed>"
    service_account.0.scopes.#:                          "1" => "1"
    service_account.0.scopes.299962681:                  "https://www.googleapis.com/auth/compute" => "https://www.googleapis.com/auth/compute"
    tags_fingerprint:                                    "42WmSpB8rSM=" => "<computed>"
    zone:                                                "europe-west1-b" => "europe-west1-b"

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform plan
  2. terraform apply

Important Factoids

Not that i'm aware of

References

I couldn't find any maybe #5241 is related

Looks like it's probably because the schema has ForceNew: true

Schema: map[string]*schema.Schema{
            "disk": &schema.Schema{
                Type:     schema.TypeList,
                Required: true,
                ForceNew: true,
...

It'll also need some add/remove/update logic for management of the disks in resourceComputeInstanceUpdate.

I'll have a look at implementing this if I can find some time over the next few weeks as it's an issue we've run into and makes ongoing management of instances rather restrictive.

Any progress on this issue? Are google_compute* providers being used heavily in production? It feels like there are some sharp edges, such as this one.

I'll punch this out at Hashiconf this week.

Hi there @evandbrown, sorry to ankle bite you here, any progress on this? It's blocking me from using GCP, and I really want to use GCP. :)

@jevonearth I'll jump on this tomorrow. Apologies that this slipped off my radar.

Hi @evandbrown, I'll be happy to help with any sort of testing that might be useful to you on this. :)

I would be glad to help too, @evandbrown .

I bumped into this - very happy to test any fixes.

@evandbrown I hate to bother you on this, can you give an estimate on when this could be fixed? I'm at a point where I'm going to have to abandon google cloud platform for a new project just because this issue.

I know @evandbrown is working on some pretty high-priority stuff for us right now, but I should be able to take a look at this next week. I'll assign this to myself and keep you all updated.

Thank you @danawillow, my offer to help with testing stands :)

Just a quick update- I'm actively working on this and hope to have something ready to share by the end of the week (but no guarantees)

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.