aws calls hang when running in CI
Closed this issue · 9 comments
I can use this lib for all operations on my local machine. Thanks for creating it.
When running in CI (bitbucket docker) all calls block and don't return, eventually hitting CI timeout.
I discovered this calling STS but then switched to S3 :ListBuckets: same behaviour
Is there some kind of logging or other diagnostic tool I can use to debug this?
here's log from CI that installs bb with deps in case it helps...
- ./scripts/ensure-bb.sh 0.8.156
missing. Installing...
/opt/atlassian/pipelines/agent/build
./bb.tar.gz: 70.7% -- replaced with ./bb.tar
bb
Could not find /root/.deps.clj/1.11.1.1113/ClojureTools/clojure-tools-1.11.1.1113.jar
Downloading tools jar from https://download.clojure.org/install/clojure-tools-1.11.1.1113.zip to /root/.deps.clj/1.11.1.1113/ClojureTools
Cloning: https://github.com/grzm/awyeah-api
Checking out: https://github.com/grzm/awyeah-api at a3ce8c5
Cloning: https://github.com/babashka/spec.alpha
Checking out: https://github.com/babashka/spec.alpha at 433b0778e2c32f4bb5d0b48e5a33520bee28b906
Downloading: com/cognitect/aws/sts/822.2.1145.0/sts-822.2.1145.0.pom from central
Downloading: com/cognitect/aws/endpoints/1.1.12.230/endpoints-1.1.12.230.pom from central
Downloading: com/cognitect/aws/lambda/822.2.1145.0/lambda-822.2.1145.0.pom from central
Downloading: org/clojure/clojure/1.11.1/clojure-1.11.1.pom from central
Downloading: com/cognitect/aws/s3/822.2.1145.0/s3-822.2.1145.0.pom from central
Downloading: com/cognitect/aws/cloudfront/822.2.1145.0/cloudfront-822.2.1145.0.pom from central
Downloading: org/clojure/spec.alpha/0.3.218/spec.alpha-0.3.218.pom from central
Downloading: org/clojure/core.specs.alpha/0.2.62/core.specs.alpha-0.2.62.pom from central
Downloading: org/clojure/pom.contrib/1.1.0/pom.contrib-1.1.0.pom from central
Downloading: org/babashka/cli/0.2.22/cli-0.2.22.pom from clojars
Downloading: funcool/promesa/8.0.450/promesa-8.0.450.pom from clojars
Downloading: com/cognitect/aws/endpoints/1.1.12.230/endpoints-1.1.12.230.jar from central
Downloading: com/cognitect/aws/lambda/822.2.1145.0/lambda-822.2.1145.0.jar from central
Downloading: org/clojure/core.specs.alpha/0.2.62/core.specs.alpha-0.2.62.jar from central
Downloading: org/clojure/spec.alpha/0.3.218/spec.alpha-0.3.218.jar from central
Downloading: org/clojure/clojure/1.11.1/clojure-1.11.1.jar from central
Downloading: com/cognitect/aws/sts/822.2.1145.0/sts-822.2.1145.0.jar from central
Downloading: org/babashka/cli/0.2.22/cli-0.2.22.jar from clojars
Downloading: com/cognitect/aws/s3/822.2.1145.0/s3-822.2.1145.0.jar from central
Downloading: funcool/promesa/8.0.450/promesa-8.0.450.jar from clojars
Downloading: com/cognitect/aws/cloudfront/822.2.1145.0/cloudfront-822.2.1145.0.jar from central
babashka v0.8.156
bb is operating normally i.e. steps prior to aws calls are ok e.g. using org.babashka/cli to parse args
My next step would be to reproduce in a local docker container but then I'll need some way to dig into the aws blocking calls.
one thing worth noting is that I'm using newer aws versions that you list in the docs. this is because the STS lib didn't offer a version matching the ones listed so I upgraded them all to 822.2.1145.0
more info: in AWS I can see that the credentials have never been used i.e. the call did not complete the authentication phase. I guess this means the block is in the creds chain provider?
Tried latest commit/sha. same block.
Tried removing creds from aws i.e. keys not active. same block.
Tried removing keys env from env. Got following logs...
testing aws..
2022-06-25T00:45:22.787Z cef7caac-cc7e-433b-9ee4-eadd58adbf33-mlgfs INFO [com.grzm.awyeah.credentials:?] - Unable to fetch credentials from environment variables.
2022-06-25T00:45:22.790Z cef7caac-cc7e-433b-9ee4-eadd58adbf33-mlgfs INFO [com.grzm.awyeah.credentials:?] - Unable to fetch credentials from system properties.
..but then same block
There was an issue in the http-client which wasn't properly handling http-client exceptions. I could trigger the hanging behavior you describe independent of any container if the default credentials provider falls through to the instance-profile-credentials-provider
.
awyeah-api/src/com/grzm/awyeah/credentials.clj
Lines 284 to 300 in 2d046da
If you don't have a EC2 metadata host available (e.g., you're not running in EC2), the http-client would throw, the exception wouldn't be properly handled, and it would hang.
I suspect this is what's happening in your case as well. The fix in 0fa7dd5 resolved the issue I was seeing, and I think it'll likely fix what you're seeing as well, or at least not hang.
If you're continuing to see an issue, please provide a small, isolated, reproducible test case that exhibits the error you're seeing. If you're only seeing the behavior in a container, a minimal Dockerfile would be really helpful as well. FWIW, I tried to fetch https://bitbucket.org/nextdoc/nd-client-editor/src/954eaf5a57658eb5a61573894c7949b32cafdb20/scripts/ensure-bb.sh but got "Repository Not Found."
If you're having issues with credentials specifically, you supply one of the credentials providers directly rather than relying on the default, or you can create your own credentials provider.
Hopefully this helps. Let me know if it doesn't.
I'll close this issue. If you do have further problems, just open another one with a reproducible test case. Thanks!
thanks. I'll try it out when I next work on CI and will confirm fix back here
finally got back to this and can confirm that assume-role now works by creating a reified CredentialsProvider.
thanks for the fix