VirtuslabRnD/pulumi-kotlin

Add scripts to GitHub Actions that will clean up after failed E2E tests

Opened this issue · 0 comments

jplewa commented

In August, there was a lot of issues with GH Actions, so I guess we had more failed E2E tests than usually. A few days ago, a request came from VL admins for us to add some sort of cron for cleaning up leftovers from E2E tests. I have been checking stuff on GCP and deleting leftover VMs periodically, but I forgot to take a look at Azure and it turns out we had some VMs there. It's not much money, less than $100 since the beginning of the year, but we should be more careful in the future.

For now, I went through all our stacks on Pulumi, refreshed them (to account for any machines that may have already been dropped manually), destroyed the remaining resources, and deleted the stacks. I'll continue to check up on the list of stacks periodically for now, but we might want to automate this.

The following script was sent to me by the admin:

for n in $(az group list --query "[?contains(name,'azure')].name" -o tsv )
do
    echo "Working on $n"
    az group delete --name $n --yes
done

It should take care of leftover resources on Azure, but it leaves some questions:

  1. What about GCP?
  2. How do we ensure that we're not dropping stuff in the middle of running E2E tests?
  3. On GCP we definitely don't want to drop our CI runners ;)

Alternatively, we could write a script that would refresh leftover stacks, destroy resources, and delete them, but we would also have to ensure that we're not doing that while tests are running (maybe it's possible to fetch the date of the last time a stack was updated?).