google/caliban

Documentation: Caliban Default Creds

jordanrule opened this issue · 8 comments

Couple issues

  1. gcloud service account credentials now required (as .caliban_default_creds) but not documented

  2. Even when I supply working credentials, gcloud auth inside Docker is not working:

Step 9/18 : RUN gcloud auth activate-service-account --key-file=/.creds/credentials.json &&   git config --global credential.'https://source.developers.google.com'.helper gcloud.sh
 ---> Running in ef0260e778f6
ERROR: (gcloud.auth.activate-service-account) The .json key file is not in a valid format.
The command '/bin/sh -c gcloud auth activate-service-account --key-file=/.creds/credentials.json &&   git config --global credential.'https://source.developers.google.com'.helper gcloud.sh' returned a non-zero code: 1
E0820 10:49:44.640692 4603461056 main.py:165] Docker failed with error code 1.

Confirmation it works external to Docker:

gcloud auth activate-service-account --key-file ...
Activated service account credentials for: [...]

I got past this by passing in --cloud_key - seems like it should be added to Getting Started, can open a PR if desired.

OK, now I see the documentation: https://caliban.readthedocs.io/en/latest/cloud/service_account.html , and that the documentation just assumes GOOGLE_APPLICATION_CREDENTIALS is set to a service account (not user account). None of this is a big deal, just some clarification on credentials would be good so people (particularly scientists) don't bounce before getting everything running or at least knowing what to ask for. Some clarification on why credentials are needed to run locally would be good too?

Hey @jordanrule! Thanks for this report - this is definitely a bug if it's on Caliban 0.3.0 or later. Can I ask what version of Caliban you're on? There did indeed used to be a service account credential requirement, before I figured out how I could hijack the normal, gcloud auth login credentials to do job submissions. That's why I dropped the service account requirements from the main docs.

I think that the code should work the way you expect (with no requirement here) on the latest published version of Caliban.

You've also discovered an internal trick we use. Here's my current understanding of how google SAYS credentials work:

  • If you set GOOGLE_APPLICATION_CREDENTIALS, it has to point to a service account JSON key, as you've described, not anything else.
  • That key almost always lives in some directory like ~/.config, somewhere user-local.
  • Docker isn't able, by design, to access any files outside of the folder where it's running. So, as a hack, if you HAVE set that environment variable, we create a temporary copy in the project folder, then delete it when the caliban command finishes running. It's not that elegant... I would use a temporary filename, but that would bust the Docker cache. That's why you see the hidden filename .caliban_default_creds; that's the static name of the temp file.

What to do?

So how to solve your problem? I think if you upgrade Caliban, this will go away, assuming you've got GOOGLE_APPLICATION_CREDENTIALS pointed at a valid service account key (and that you delete your copy of the temporary file, so Caliban can write and delete it).

If you've got GOOGLE_APPLICATION_CREDENTIALS pointed at any OTHER type of creds, like the "application default credentials", then you'll see failures, I expect.

Okay, let me know what you think! Let me know if I've misunderstood anything, and especially if the problem persists after an upgrade. The credentials part of all of this is annoying and we want to make the experience easy for folks.

jordandrule$ caliban --version
caliban 0.3.0

So here's the issue from my perspective: I don't think requiring users to set GOOGLE_APPLICATION_CREDENTIALS to a service account is a good best practice (yes that's what I ended up doing after running into the same issue with caliban cloud). In a corporate environment we want to be able to know who is racking up the BQ charges, and we can't do that if all the scientists are using the same service account. Telling them to unset GOOGLE_APPLICATION_CREDENTIALS after every deployment is a workaround but not ideal.

I might recommend having a different environment variable GOOGLE_APPLICATION_SERVICE_CREDENTIALS that if set would pull from, and only if it is not set only then default to GOOGLE_APPLICATION_CREDENTIALS.

Can you provide more detail on hijacking google auth login and not requiring service credentials? I am on 0.3.0 and caliban run is not working without service creds. An ideal on-boarding process would allow a scientist to run locally and become comfortable with usability before having to bug engineering for a service account.

Hey @jordanrule ,

I agree with you here. But the GOOGLE_APPLICATION_CREDENTIALS variable ONLY works when pointing at a service account key: https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable

That's the only purpose of that variable, so, if you want to associate billing I think it makes sense to not use this variable at all, and to have folks use gcloud auth login to authenticate once. Make sense?

The "trick" is that we shell out to gcloud auth print-access-token and use that token for AI Platform authentication. So, if gcloud auth list shows that you're authenticated everything should work.

Seeing this above:

and that the documentation just assumes GOOGLE_APPLICATION_CREDENTIALS is set to a service account (not user account)

Can you point me to any documentation you find that describes this use of GOOGLE_APPLICATION_CREDENTIALS? I'm not sure what "pointing to a user account" means here. Thanks!

Aha - you have taught me the distinction between ADC and a service key, and now I see the caliban runs without GOOGLE_APPLICATION_CREDENTIALS even set.

For some reason I had set GOOGLE_APPLICATION_CREDENTIALS set to application_default_credentials.json (who knows why, given the documentation you pointed out states that should never work), which was causing the problem.

Thank you for your patience. I am demoing caliban on Monday so at least you have given me the confidence to explain different Google authentication strategies. Feel free to close this.

It’s all terribly confusing... and to make it more confusing, it DOES work to do what you say for a few random binaries Google produces, I think by accident.

Also, I’m not sure that the docs make this clear, but Caliban will in fact copy application default creds into the container if you have them set. This is nice for getting in-container authorized for buckets etc.

I’ll go ahead and close, but please keep asking questions! Happy to help, and I’m sure you’ll expose some error at some point, which will be quite helpful.

Cheers,
Sam