boltops-tools/terraspace

Terraspace all should halt on any error

johnlister opened this issue · 2 comments

Checklist

  • [X ] Upgrade Terraspace: Are you using the latest version of Terraspace? This allows Terraspace to fix issues fast. There's an Upgrading Guide: https://terraspace.cloud/docs/misc/upgrading/
  • Reproducibility: Are you reporting a bug others will be able to reproduce and not asking a question. If you're unsure or want to ask a question, do so on https://community.boltops.com
  • Code sample: Have you put together a code sample to reproduce the issue and make it available? Code samples help speed up fixes dramatically. If it's an easily reproducible issue, then code samples are not needed. If you're unsure, please include a code sample.

My Environment

Software Version
Operating System Win 10 WSL v2
Terraform 1.3.6
Terraspace 2.2.3
Ruby 3.0.3p157

Expected Behaviour

terraspace all should stop on any error at each batch level. Further processing needs to be halted if any stack results in an error.

Current Behavior

terraspace all continues to process further batches in the tree. For a terraspace up this is usually fairly safe as any dependencies will normally fail or at worst generate incorrect resources. However for terraspace down this can leave the other stacks in an inconsistent state. As the tree is parsed backwards, the stacks at the top of the tree (dependencies) are processed and destroyed. This forcibly breaks any dependencies in stacks in the lower batches and prevents them being cleaned up or applied.

Step-by-step reproduction instructions

For example if A depends on B and A fails to be destroyed on terraspace down then B will be destroyed anyway. Any dependencies in A now return an error as the outputs from B no longer exist. This results in an invalid state file for B, which cannot be destroyed.

logs: (truncated)

Batch Run 1:
Running: terraspace down monitoring Logs: log/down/monitoring.log
terraspace down monitoring:  Destroy complete! Resources: 0 destroyed.
Batch Run 2:
Running: terraspace down batch_operations Logs: log/down/batch_operations.log
Running: terraspace down claims_management Logs: log/down/claims_management.log
terraspace down batch_operations:  No changes. No objects need to be destroyed.
terraspace down batch_operations:  Destroy complete! Resources: 0 destroyed.
terraspace down claims_management:  │ Error: No value for required variable
terraspace down claims_management:  │ Error: No value for required variable
Error running: terraspace down claims_management. Fix the error above or check logs for the error.
Batch Run 3: <===== Should fail here and not do batch 3
Running: terraspace down application_common Logs: log/down/application_common.log
terraspace down application_common:  No

Solution Suggestion

terraspace all should stop on any error in any stack at each batch level.

Looks like there is an exit_on_fail config param which should default to true. However this doesn't appear to work.

Thanks for the report. Kind of remember this, at some point was testing terraspace all down in rapid-fire fashion and felt like config.all.exit_on_fail.down = false was a better default. Forgot to update the docs though. Updated the docs so it reflects the default in the source code

Got some mixed feelings about whether the default for config.all.exit_on_fail.down should be true or false. If there are dependency stacks: A -> B -> C. And it's destroyed in C -> B -> A order. If it fails halfway at B, then running terraspace all down again it'll never reach B, since C was already been successfully deleted the first attempt. The config.all.exit_on_fail.down = false setting is useful when was trying to destroy and clean up things over and over and something would fail. Otherwise, C will keep failing, we would never get pass C and it would exit. This is why adjusted the default to false.

At the time, did not run into an issue with the an invalid state file 😔 Right now leaning on leaving the default of config.all.exit_on_fail.down = false still. If you want it to be true:

config/app.rb

Terraspace.configure do |config|
  config.all.exit_on_fail.down = true
end