cloudspannerecosystem/autoscaler

Scaler throwing Permission denied when ran with spanner.admin role

Closed this issue · 2 comments

Hi Team,
We recently deployed the autoscaler components on our GCP GKE clusters to autoscale our regional(europe-west2) Spanner instance, all the deployments & cronJobs are working as expected, but the scaler deployment is not able to down/up scale the spanner nodes. - getting "Error 7 PERMISSION DENIED" in pod logs

We have created scaler & poller serviceAccounts with different names (not poller-sa & scaler-sa as suggested in the gke terraform documentation) & given predefined roles to them - roles/spanner.viewer to poller svcAcct & roles/spanner.admin to scaler svcAcct (using workload identities annotated with gcp service accounts in deployment yaml for both)

Plz suggest if we are missing something, or is this a known issue?

Hi -- thanks for the report. Certainly there's nothing unique or special about the service accounts that are configured by default in Terraform, so here are some things to check with the customization:

  • Is the scaler correctly receiving the details of the Spanner instances to scale from the poller? It sounds like this is the case, and if so, this suggests that the poller configuration is correct and that the issue lies with the configuration of the scaler.
  • Has the GKE Workload Identity policy binding been configured for the scaler SA for the correct namespace with the correct KSA and GSA? Equivalent to the Terraform here (the binding is needed as well as the annotation).
  • The lowest level the role roles/spanner.admin can be granted is at the project level -- does this match your configuration?
  • As well as the Spanner permissions, does the scaler service account have permissions on the state store you are using, either Firestore or Spanner? Based on a previous issue I suspect you are using Spanner, but thought I would ask all the same.

If you are able to share the context of the error(s) from the scaler component logs and your config YAML (with any sensitive information redacted), then this would be most helpful in diagnosing the issue. Thanks!

Hi Henry,

Thanks for your prompt reply!
I was able to trace the issue related to your point #3 above, we provided the instance level spanner.admin permission - which is not correct.
Since this will cause a security compliance for us - we would be going ahead with creating a custom IAM role as suggested in the documentation.

Closing this as the issue has been resolved.