Sweep all resources not working
aghassemlouei opened this issue · 6 comments
When leveraging the entire all.yml
services list and using 0.4.1
on macos 10.15.3
against AWS GovCloud regions, us-gov-west-1
and us-gov-east-1
where resource counts are higher than 800 per service awsweeper
hangs and often needs to have the config file only include a subset or each service individually.
Had to break out ebs, eip, and security groups out to individual executions. Also, it appears as though vpc peering and public ip associations make it difficult to easily delete vpc's.
Hi @aghassemlouei again. Thanks for providing the issue. I made some bigger changes and fixes commited to master
in the last week, but haven't released them yet. Can you check if your problems still occur on master (or with v0.5.0
, which I will release by tomorrow).
Evening @jckuester,
Just ran the following steps and ran into similar issues but with incremental improvements:
curl -LO https://github.com/cloudetc/awsweeper/releases/download/v0.5.0/terradozer-0.5.0-darwin-amd64.tar.gz
tar -xzf terradozer-0.5.0-darwin-amd64.tar.gz
chmod +x terradozer-0.5.0-darwin-amd64/terradozer
cat > custom.yml << EOF
aws_ami:
aws_autoscaling_group:
aws_cloudformation_stack:
aws_ebs_snapshot:
aws_ebs_volume:
aws_efs_file_system:
aws_eip:
aws_elb:
aws_instance:
aws_internet_gateway:
aws_key_pair:
aws_kms_alias:
aws_kms_key:
aws_launch_configuration:
aws_nat_gateway:
aws_network_acl:
aws_network_interface:
aws_route53_zone:
aws_db_instance:
aws_route_table:
aws_s3_bucket:
aws_security_group:
aws_subnet:
aws_vpc:
aws_vpc_endpoint:
EOF
./terradozer-0.5.0-darwin-amd64/terradozer --region us-gov-west-1 --profile canary --dry-run custom.yml
When executed all at once services wouldn't fully enumerate their resources, however, when broken out into smaller chunks .e.g., s3 buckets and rds, things did work. C
At least the s3 executions seem to be effective now so I closed out #71. When I let the execution run over the weekend apparently the vpc peering connections was throwing awssweeper/terraform for a loop with dependencies that couldn't be broken so that may also be something to take into consideration if folks just import the all.yml
and execute it.
Thanks again for the quick release hopefully this data is useful and not bothersome!
Thanks for your feedback. I haven't tested awsweeper
at scale yet and your insights are very interesting and helpful - I'll do my best to improve your experience with the tool. Let's go into more detail about what you experienced:
-
wouldn't fully enumerate their resources
: does this happen during the listing/dry-run stage before starting to delete or are all resources fully listed and only during the deletion stage resources are not fully enumerated? -
Note that I haven't implemented pagination yet with the AWS API, which might also causing an issue that a limited number of resources is listed and not all (per particular resource type). But breaking into smaller chunks shouldn't really help with this issue, but running
awsweeper
several times. -
Vpc peering connections was throwing awssweeper/terraform for a loop
: what did the output look like here? Did it say 'will retry to delete resource'? It might be that themax_retries
parameter of Terraform is set too high (default 25) an therefore a failed deletion is retried to often and hangs for very long time (hashicorp/terraform-provider-aws#1209, https://www.terraform.io/docs/providers/aws/index.html#max_retries). Someone added themax_retries
parameter toawsweeper
, but it is disabled currently. I will fix that.
Hmm, I just looked into the code how Terraform deletes a VPC (see below). In the case you described, it is a DependencyViolation
(because vpc peering connection still attached), so Terraform will retry deleting for 5 minutes
. This is not what we really want and unfortunately the max_retries
parameter mentioned above will not help here....
err := resource.Retry(5*time.Minute, func() *resource.RetryError {
_, err := conn.DeleteVpc(deleteVpcOpts)
if err == nil {
return nil
}
if isAWSErr(err, "InvalidVpcID.NotFound", "") {
return nil
}
if isAWSErr(err, "DependencyViolation", "") {
return resource.RetryableError(err)
}
return resource.NonRetryableError(fmt.Errorf("Error deleting VPC: %s", err))
})
if isResourceTimeoutError(err) {
_, err = conn.DeleteVpc(deleteVpcOpts)
if isAWSErr(err, "InvalidVpcID.NotFound", "") {
return nil
}
Hi @aghassemlouei again. I thought about the problem again and came up with a solution. Let me know what you think.
awsweeper
can now be run with a timeout for the delete operation, i.e., awsweeper --timeout 1s config.yml
.
This way, if a VPC or any other resource still has a dependency, the delete times out in, for example, 1s
(default is set to 20s
). Here is how the output looks like:
• SHOWING RESOURCES THAT WOULD BE DELETED (DRY RUN)
---
Type: aws_vpc
Found: 1
Id: vpc-1234
Tags: [Name: foo]
---
• TOTAL NUMBER OF RESOURCES THAT WOULD BE DELETED: 1
• Are you sure you want to delete these resources (cannot be undone)? Only YES will be accepted.
Enter a value: YES
• STARTING TO DELETE RESOURCES
• will retry to delete resource id=vpc-1234 type=aws_vpc
• FAILED TO DELETE THE FOLLOWING RESOURCES (RETRIES EXCEEDED): 1
• aws_vpc error=destroy timed out (1s) id=vpc-1234
• TOTAL NUMBER OF DELETED RESOURCES: 0
This worked significantly better! If for nothing else than the feedback presented to the end user. Syntax provided for posterity:
curl -LO https://github.com/cloudetc/awsweeper/releases/download/v0.7.0/awsweeper_0.7.0_darwin_amd64.tar.gz
tar -xzf awsweeper_0.7.0_darwin_amd64.tar.gz
chmod +x awsweeper_0.7.0_darwin_amd64/awsweeper
cat > custom.yml << EOF
aws_ami:
aws_autoscaling_group:
aws_cloudformation_stack:
aws_ecs_cluster:
aws_ebs_snapshot:
aws_ebs_volume:
aws_efs_file_system:
aws_eip:
aws_elb:
aws_iam_instance_profile:
aws_iam_role:
aws_instance:
aws_internet_gateway:
aws_key_pair:
aws_kms_alias:
aws_kms_key:
aws_lambda_function:
aws_launch_configuration:
aws_nat_gateway:
aws_network_acl:
aws_network_interface:
aws_db_instance:
aws_route53_zone:
aws_route_table:
aws_s3_bucket:
aws_security_group:
aws_subnet:
aws_vpc:
aws_vpc_endpoint:
EOF
./awsweeper_0.7.0_darwin_amd64/awsweeper --region us-gov-west-1 --profile core --timeout 1s custom.yml
The failure conditions were far more clear with a faster turnaround. The only cosmetic bit of feedback would be regarding the AWS-managed IAM roles or the KMS keys. Terraform seems to complain but it's definitely a non-issue:
error deleting IAM Role (AWSServiceRoleForSupport) policy attachments: Error deleting IAM Role AWSServiceRoleForSupport: UnmodifiableEntity: Cannot perform the operation on the protected role 'AWSServiceRoleForSupport' - this role is only modifiable by AWS
AccessDeniedException: User: arn:aws-us-gov:iam::123456789:user/aghassemlouei is not authorized to perform: kms:ScheduleKeyDeletion on resource: arn:aws-us-gov:kms:us-gov-west-1:123456789:key/1234567-1234-1234-1234-1234567
Closing this out as the major issues have been addressed; thanks for all your hard work!