Gruntwork test ACM certificate getting removed in `us-west-1` and `eu-west-2`
Closed this issue · 25 comments
The wildcard ACM certificate used for testing in phxdevops
is getting removed by the scheduled cloud-nuke
job, but only in us-west-1
and eu-west-2
regions. We need this key to be present in these two regions for tests.
I haven't had a chance to dig in to see if we are using a config exclusion, and it isn't working, or if there is some other reason for the removal. It appears to be only these two regions, the key does not get deleted in other regions.
Temporarily disabled tests for these regions in terraform-aws-load-balancer (via this commit) to minimize test failures. Leaving a note here as a reminder to re-enable these regions once this issue is resolved.
Looked at the existing configuration and it doesn't seem to have any filtering for ACM resource type - https://github.com/gruntwork-io/cloud-nuke/blob/master/.circleci/nuke_config.yml.
Also, we don't pass in ACM as the exclude resource type - https://github.com/gruntwork-io/cloud-nuke/blob/master/.circleci/config.yml#L45.
It seems like we have an additional filter to exclude resources that are currently being used. However, I guess the certificates in those regions were not being used for some region?
Hey @arsci, I created certificates in both us-west-1
and eu-west-2
regions. I'm not sure if creating certificates solves the test failures. I'll leave this ticket open until we verify whether tests pass again.
Thanks @hongil0316! Yes, just creating those certs should solve the tests errors.
@gcagle3 you can re-enable tests for your branch in those two regions
@gcagle3 you can re-enable tests for your branch in those two regions
These have been re-enabled. Thanks!
Nice. I'll close this issue then!
It seems like the ACM for gruntwork testing seems to have domain name *.gruntwork.in so we can create a filter for that
![]()
@hongil0316 would it be possible to add a filter for the domain name 'gruntwork.in' as well? It looks like we'll need ACM certs for both '*.gruntwork.in' and 'gruntwork.in' for all of the load balancer tests to complete.
We'll need this for the following regions (if the filter is region specific, if not no worries):
- us-east-1
- us-east-2
- us-west-1
- us-west-2
- eu-west-1
- eu-west-2
I believe this change should cover both scenarios:
Ah, that should work, splendid. Thank you, I'll re-close this one!
@hongil0316 hm, some bad news. It looks like the certificates for just 'gruntwork.in' (not *.gruntwork.in) are still being deleted every night.
As a thought, should the 'include' in this code actually be exclude? If I'm reading this right, it looks like 'include' is used to identify resources to be nuked and 'exclude' is being used to protect resources from deletion. In this case, we want to protect both *.gruntwork.in and gruntwork.in certs.
ACM:
include:
names_regex:
- "^gruntwork.in"
@hongil0316 would there be anything else scheduled to clear these certs out? Monitoring things from yesterday, it looks like all of the 'gruntwork.in' certs were wiped out again last night.
Noting the following changes in the Phoenix account:
- In us-east-1, us-east-2, us-west-2, and eu-west-1, the 'gruntwork.in' cert was deleted
- In us-west-1 and eu-west-2, both the 'gruntwork.in' and '*.gruntwork.in' certs were deleted
Hey @gcagle3 @arsci, I just tested the cloud-nuke
behaviour with the circleCi nuke_config.yml
and it seems to work as expected. This is the command line I used:
aws-vault exec phxdevops -- go run main.go aws --resource-type acm --log-level debug --region us-east-1 --config .circleci/nuke_config.yml
From this execution, I verified that it seems to be indeed deleted by circle ci - https://app.circleci.com/pipelines/github/gruntwork-io/cloud-nuke/12355/workflows/ac53f08b-cbbe-41b8-8f58-3eaadbb781d9/jobs/34125.
Hmm the debug message doesn't seem to help too much... I wonder if the config file values are being reflected properly.
The odd thing is that we are having different behaviour when running in circleCi vs. locally...
The odd thing is that we are having different behaviour when running in circleCi vs. locally...
Would CircleCi be setting specific environment variables that would change the behavior?
Looking at the debug logs, it seems like the config is not properly parsed for whatever reason:
DEBUG shouldInclude result for ACM: arn:aws:acm:us-west-1:087285199408:certificate/1fc80fe9-5bcf-43d9-86c1-e5a6d3d6ffd9 w/ domain name: gruntwork.in, time: 2023-10-11 19:46:05.48 +0000 UTC, and config: {IncludeRule:{NamesRegExp:[] TimeAfter:<nil> TimeBefore:<nil> Tag:<nil>} ExcludeRule:{NamesRegExp:[] TimeAfter:2023-10-11 19:52:45.931855658 +0000 UTC m=-7198.437556645 TimeBefore:<nil> Tag:<nil>}}
You will notice that NamesRegExp
for both IncludeRule
and ExcludeRule
are empty.
After internal discussion, we're wondering if the default tests (which run with no config) might be causing this. Looking at the test, it will check if the cert is in use and try to delete if it is not in use.
Comparing the above against the following we've observed:
- In us-east-1, us-east-2, us-west-2, and eu-west-1, the 'gruntwork.in' cert was deleted
- In us-west-1 and eu-west-2, both the 'gruntwork.in' and '*.gruntwork.in' certs were deleted
It is important to note that the '*.gruntwork.in' cert is 'in use' in the us-east-1, us-east-2, us-west-2 and eu-west-1 regions where it is not being deleted. It is not in use in us-west-1 / eu-west-2 where it is being deleted, which would make sense if the tests are actually deleting the certs. All certs being deleted are not in use.
@hongil0316 do you think it might be worth adjusting the test case to address this?
Had further discussion internally here: https://gruntwork-io.slack.com/archives/C6V3DJAHJ/p1697159092430149
We initially mitigated the issue by excluding ACM resource type - https://github.com/gruntwork-io/cloud-nuke/pulls?q=is%3Apr+is%3Aclosed.
After further troubleshooting, I realized that the bug existed when we had both config file + time filter present at the same time. Here is a fix for the bug - #607
As a quick follow-up, I can confirm that both the *.gruntwork.in and gruntwork.in certificate have not been deleted (and are still present) since this PR was merged. Great work on this @hongil0316!