nix-community/terraform-nixos

Random file provisioner error and SSH authentication failure with AWS EC2

spearman opened this issue · 3 comments

Describe the bug
When provisioning a new instance, it will sometimes (usually, but not always) fail with a "file provisioner error" with SSH authentication failed

To Reproduce
terraform init and terraform apply with the following configuration (main.tf is placed in terraform/main.tf, and .nix files are in nixos/configuration.nix and nixos/git-server.nix: https://gist.github.com/spearman/58db5a31afd88c8962d9a5b3da78ac00

Expected behavior
I would expect it to be reproducible and not fail randomly.

Environment

  • OS name + version: NixOS 21.11
  • Version of the code: rev 646cacb

Additional context
Here is the full output when running terraform apply:

https://gist.github.com/spearman/5f19ffb4c80791f0444c4a2a3b88afab

This was after it had been successfully deployed and I was trying to change the configuration. Usually when it occurs during creation I can log in as root with the generated .pem file, but the nixos configuration has not been applied.

I thought maybe it was a problem with the particular AMI I was using, but I have experienced the problem with 20.09, 21.05, and 21.11.

I experienced this error after deploying OpenSSH 8.8 to a remote instance. It turned out that OpenSSH 8.8 disabled the ssh-rsa key algorithm for security reasons and the terraform provisioner is not working with the newer sha2 algorithms yet (hashicorp/terraform#30134).

As a workaround you can add the following to your system configuration:

services.openssh.extraConfig = ''
   HostkeyAlgorithms +ssh-rsa
   PubkeyAcceptedAlgorithms +ssh-rsa
 '';

This will soon have a better(?) solution, with hashicorp/terraform-provider-tls#150 hopefully coming out soon, where you could then just switch your keys to ed25519 instead of rsa to avoid this issue altogether.

I'm not sure if this is related, I have been trying to deploy using Gitlab CI and I get the error on the same line, but the last error is an i/o timeout, not an SSH authentication error:

https://gist.github.com/spearman/6c44d4a354a3644d6e75f74c2d98fd91