kubernetes-sigs/image-builder

Unable to override AZURE_LOCATION for VHD builds

kkeshavamurthy opened this issue · 9 comments

What steps did you take and what happened:
#1005 Added code to randomize the location of the azure storage account.

export AZURE_LOCATION="$(get_random_region)"

due to this the AZURE_LOCATION set by the user will be overwritten. While this might work for CI, we have resources like VPN, subnets etc. available only in a particular region so creating the storage account in the region specified with AZURE_LOCATION is crucial.

What did you expect to happen:
Use the location specified by the AZURE_LOCATION

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

CC: @jsturtevant @mboersma @willie-yao
Environment:

Project (Image Builder for Cluster API, kube-deploy/imagebuilder, konfigadm):

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver):
  • Packer Version:
  • Packer Provider:
  • Ansible Version:
  • Cluster-api version (if using):
  • Kubernetes version: (use kubectl version):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@mboersma @jsturtevant
Any thoughts on how we can work around this?
Is it possible to check if a CI flag is set and only then radomize the region? That way AZURE_LOCATION provided by the user is honored.

could we do something like

location="$(get_random_region)
 export AZURE_LOCATION="${AZURE_LOCATION:-location}"

That would respect the passed in value but provide a default if not specified?

I tried that, since AZURE_LOCATION gets set on line14, get_random_regions will not be called.

hmmm line 14 seems like it should be removed and the get_random_region should be hoisted up higher. It seems that the RG and storage account should be in the same place. Not sure why we would always have the RG in southcentralus and the storage accounts other places. @mboersma any Ideas?

This was code added in #1005 that was intended to spread the storage accounts around, since they don't have to live in the same region as their resource group. But it obviously didn't take into account disruption.

The problem (at least for the CAPZ DevOps Pipeline) is that we need the current behavior to be maintained: don't supply AZURE_LOCATION, get the RG in the default southcentralus and the SA in a random region.

Is it possible to check if a CI flag is set and only then radomize the region?

Maybe that's the easiest solution:

if [ "$RANDOMIZE_STORAGE_ACCOUNT" == "true" ]; then
  export AZURE_LOCATION="$(get_random_region)"
fi

cc: @willie-yao

Yup I think that's the best solution. The original intention was to spread the service accounts around so not all of our tests are being hosted in the same region. Does adding an environment variable like $RANDOMIZE_STORAGE_ACCOUNT work for you? @kkeshavamurthy

Why does the RG have to be in southcentral? Can't randomize that as well?

In the CAPZ pipeline, we initially (years ago) took the default southcentralus and resource group cluster-api-images. So now that's our "build area" where we know to check for artifacts and where cleanup scripts watch. and if we were to randomize the location, we'd have to randomize the RG name as well so they don't collide.

After this stage is the "test" stage, which uses the SA to create a managed image and test it in a k8s workload cluster. That all has to live in one AZURE_LOCATION, and the testing is what we need to distribute among locations, as @willie-yao pointed out.