nginxinc/kic-reference-architectures

runner down fails at ECR step

monrax opened this issue · 4 comments

Describe the bug
Pulumi throws an error when doing runner down at the ECR step. Every subsequent attempt will result in a failure until the nginx-ingress image in the ECR repository is deleted manually.

To Reproduce
Steps to reproduce the behavior:

  1. Clone repo
  2. Set up virtual environment with setup_venv.sh
  3. Run runner -p aws -s new up and wait for all the resources to be created
  4. Initiate stack tear down with runner -p aws -s new down
  5. Pulumi throws error at ECR step

Expected behavior
Stack tear down process happens without any errors.

Your environment

  • Version of the repo - commit 1b76f7b
  • Version of KIC: 2.4.1
  • Version of infrastructure tooling (e.g. Pulumi)
  • Version of executing environment (e.g. Python, node): Python 3.10.6
  • OS and distribution: Linux (kernel version 5.15.0-1022-aws), Ubuntu 22.04.1
  • Details about containerization or virtualization environment: t3.large-type EC2 instance running Ubuntu 22.04.1

Additional context

ECR Repository (ingress-controller-new) not empty, consider using force_delete: RepositoryNotEmptyException: The repository with name 'ingress-controller-new' in registry with id '<ID>' cannot be deleted because it still contains images

@monrax - thanks for the report; the jenkins jobs do a force by default, but it looks like we need to port that logic into runner.

As a work-around, the following logic is used by the automated Jenkins jobs to ensure we can re-run. It's hacky and sloppy and needs to be fixed...but if you really need to move forward...

https://github.com/nginxinc/kic-reference-architectures/blob/master/extras/jenkins/AWS/Jenkinsfile#L184-L206

Hey @qdzlug, thank you for the work-around. The way I ended up avoiding this issue was passing True to the force_delete parameter here, but I'm not sure if this is the preferred way of deleting this resource.

@monrax - glad you were able to get it working, and thanks for sharing your code - I haven't dug into it that much as of yet, but I think that may be the winning fix. Now, I just need to make sure that putting that in the mainline isn't going to cause issues for anyone with existing infrastructure.