loafoe/terraform-provider-ssh

Ability to control retries for different stages of SSH command execution

Opened this issue · 0 comments

SSH command execution has different stages: establishing connection, authenticating, uploading files, running commands. We would like to have better control retries on error in these stages to make development and testing easier. Currently, this provider retries on all errors until timeout limit is reached; however we often want to disable retries or limit to a small number:

  • We do not want to retry command execution when we know the issue is not recoverable (i.e. a manual cleanup is needed before rerun)
  • We do not want to retry if we have authentication issues since such errors are often not recoverable
  • We do not want to retry attempting to connect in DEV, as it often indicates we are attempting to SSH to a wrong server or the server has not completed initializing; we want to be able to detect such issues and control execution timing through resource dependencies to minimize the number production errors that are caused by resource provisioning delays.
  • We do not want to retry file upload in DEV, as it often have configuration related issues such as incorrect file paths or permissions

Proposal:
Introduce the following flags that will control the number of retries for different stages of execution:

  • connection_retries / bastion_connection_retries - controls retires of network-related connection errors: address is not reachable (connection timeout), or SSH service has not started yet (connection refused or connection reset), domain name resolution errors.
  • auth_retries / bastion_auth_retries - controls retries when authentication failed (incorrect key or password).
  • file_upload_retries - controls the number of retires to perform when uploading files.
  • command_retries - controls the number of retires to perform when running commands.
  • pre_command_retries controls the number of retires to perform when running commands.

Values:

  • 0 - no retries
  • -1 - infinite retries (default for backward compatibility)
  • number - the number of retries