loafoe/terraform-provider-ssh

Cannot connect to Vsphere VM

jcomish opened this issue · 13 comments

I have created a module for Rancher2 to be used in both an AWS and an OnPrem solution that uses Vsphere. The module I have created uses ssh_resource without any sort of problems in AWS. Vsphere however is struggling to get working.

In both scenarios, I generate an ED25519 key in the terraform and have loaded it onto the vm using a remote_exec.

resource "tls_private_key" "global_key" {
  algorithm = "ED25519"
}

After that, I would expect to be able to use the ssh_resource without problems, but it just keeps giving the vague result:

Error: execution of command 'echo 'hello world'' failed. stdout output

   with module.rancher_common.ssh_resource.run_command,
   on ..\..\Common\RancherCore\main.tf line 34, in resource "ssh_resource" "run_command":
   34: resource "ssh_resource" "run_command" {

The strange thing is that the remote-exec using the same key works just fine. See my code below:

resource "null_resource" "run_command" { 
provisioner "remote-exec" {
    inline = [
      "echo 'hello world'",
    ]
    connection {
      type        = "ssh"
      user        = "******"
      private_key = var.ssh_private_key_pem
      host        = var.node_public_ip
      timeout     = "5m"
    }
  }
}

resource "ssh_resource" "run_command" {
  host = var.node_public_ip
  commands = [
    "echo 'hello world'"
  ]
  user = "******"
  private_key = var.ssh_private_key_pem
}

Any idea what might be happening here?

@jcomish can you set debug_log (https://registry.terraform.io/providers/loafoe/ssh/latest/docs#debug_log) and see if there is more info availble there? It could be a compatibility issue with the key, not sure.

Here is the contents of the log:

command: echo 'hello world'
done: true
stdout:
hello world

stderr:

command: echo 'hello world'
done: false
stdout:

stderr:

error: dial tcp xx.xx.xx.xx:xx: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

I just noticed that I am embarrassingly out of date - v1.0.1
But it looks like newer versions of the provider is not available for windows. Is this intentional?

I managed to upgrade the module manually and gave a timeout of 15 minutes. Still immediately fails with the same error.

The v2.2.1 release should be available for windows amd64 and arm64 platforms (32-bit builds are not available)

For some reason, v2.2.1 was not installing on a terraform init -upgrade. That wouldn't install anything beyond v1.0.1.
I did manually install with the same error, so that isn't the problem.

Any idea if I can workaround this? I am now on the latest version, but the problem persists.

@jcomish not able to reproduce this locally. I did Google a bit and it seems that specific error only happens in context of Windows environments. Are you running Terraform on Windows? And if so, do you have the ability to try it on Linux? (e.g. WSL2) -- Unfortunately I don't have access to Windows

I am running on Windows, but I just tried it on WSL with the same result.

I have done some additional testing and found that it will work if I run the terraform again. Something must be going on with the state of the new machine.

Still strange that null_resource will immediately connect, but ssh_resource does not. I am still investigating the root cause.

I now have it working. It appears that Terraform was starting the module I was running with the ssh_resource before the machine was finished with ssh configured.

The reason that the null_resource did work is that it would retry until the specified timeout. ssh_resource was immediately failing instead of retrying, even with the timeout set to 10m. Would it make sense to have a variable to enable retry until timeout within the ssh_resource as well?

@jcomish yes definitely makes sense. I think there is an existing ticket for this #31 -- I'll look into this over the weekend

Excellent, tha k you for your help, and great job with this!