ducktors/turborepo-remote-cache

HTTP Error 412 while using recent Turbo versions

trappar opened this issue ยท 17 comments

๐Ÿ› Bug Report

I updated Turbo from 1.10.16 to 1.12.5 today and started seeing this in CI:

 WARNING  failed to contact remote cache: Error making HTTP request: HTTP status client error (412 Precondition Failed) for url (http://0.0.0.0:45045/v8/artifacts/e946449d9e73b6d1?slug=ci)

There is some discussion around this in this turbo issue, where people mention that this is likely due to using this remote cache server along with S3 specifically.

To Reproduce

I doubt that it will be possible for me to create reproduction instructions / repo for this issue considering that others have failed to reliably reproduce this in the thread above.

Expected behavior

To not get the http status errors.

Your Environment

  • Using this package with my GH action. I'm specifically using trappar/turborepo-remote-cache-gh-action@v2, which is a new version I've been working on in order to support the up-to-date version of this package.
  • Turbo version 1.12.5
  • Seeing the failures in CI while using GitHub Actions, where I'm using ubuntu-latest
  • Server is configured to connect to an S3 bucket

we've been having the same issue using Azure. Locked in at turbo v1.10 for now until we have time to fully investigate

Super weird. We're using a remote-cache server and Turbo 1.12.5 in other projects, and so far, we haven't had any problems at all

Screenshot 2024-03-21 at 10 15 10

I've found what is causing this in my particular case.

I have two different workflows. I was only seeing this issue appear in one of them.

Both of them utilize my GH Action to start a cache server like this:

- uses: trappar/turborepo-remote-cache-gh-action@v2
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  with:
    storage-provider: s3
    storage-path: turborepo-cache

However, the one that was failing had the following proceeding it:

- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::${{ secrets.ACCOUNT_ID }}:role/[REDACTED]
    aws-region: us-east-1

If I simply switch the order of these so that the remote cache server starts before configuring AWS credentials, then the error disappears.

So this may or may not be a bug depending on which credentials should take precedence. I assumed that the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env variables would take precedence over anything else, but that is clearly incorrect. I'm not actually 100% sure what the output of that aws action is (does it create an ~/.aws/credentials file or something?), but it looks like it's taking over, and those credentials don't have permission to view the S3 bucket I'm telling the cache server to use.

Regardless of if this app is doing something wrong or not, it does seem like there's room to improve the handling of this authentication failure case. These 412 Precondition Failed error are super opaque for an end user.

Maybe someone who knows more about AWS authentication could help here?

I don't think wontfix is necessarily appropriate here for two reasons:

  1. This only appears upon switching to Turbo versions above 1.10.16. Why is this same setup valid on 1.10.16? Seems like there's more to the story that's worth investigating.
  2. The error handling needs to be improved.

just tried with turbo v1.13... 412 still persists :(

image

Does anybody know of alternatives to ducktors / turborepo-remote-cache that don't have this issue?

I'm seeing this as well, 1.10.16 works but newer versions do not.

last compatible turbo version is 1.12.0, for me/us

@NullVoxPopuli, can you please provide a repro repo? I have enough time to investigate this properly, but I can't reproduce it myself.

Is there a discord or something, I don't want to spam everyone while I debug. I saw a request body too large error at one point -- while I was looking at the server logs, but I don't know if that's the problem -- trying figure out some filter syntax ๐Ÿ˜…

Yes, we have a nonpublic Discord that we set up a while ago, but I would love to have you join! https://discord.gg/PCnY8BEg

I don't think wontfix is necessarily appropriate here for two reasons:

  1. This only appears upon switching to Turbo versions above 1.10.16. Why is this same setup valid on 1.10.16? Seems like there's more to the story that's worth investigating.
  2. The error handling needs to be improved.

You are correct in saying this. As I said, we need a repro to investigate further. I also agree with the "better error handling" part you said.

@fox1t Ok, I can't guarantee that the cause for me is the same as the cause for everyone else, but here's a minimal reproduction:

https://github.com/exogee-technology/turborepo-remote-cache-323-reproduction

Let me know if you need anything else!

Just wanted to confirm that as a workaround I've done the following in my deployment:

	// Create an access key and secret key for the service to access the bucket as a workaround for
	// https://github.com/ducktors/turborepo-remote-cache/issues/323
	const user = new User(this, 'Issue323WorkaroundUser');
	const accessKey = new CfnAccessKey(this, 'Issue323WorkaroundUserAccessKey', {
		userName: user.userName,
	});
	bucket.grantReadWrite(user);
	
	const service = new ApplicationLoadBalancedEc2Service(this, 'TurborepoCacheService', {
		taskImageOptions: {
			env: {
				S3_ACCESS_KEY: accessKey.ref,
				S3_SECRET_KEY: accessKey.attrSecretAccessKey,
				// etc
			},
			// etc
		},
		// etc
	});

And this works fine. So it does really seem to be "When you don't pass secret key and access key, the server is unable to assume the execution role of the task as it should by default."

Awesome! How can we fix this directly in the app?

Personally I'd start by upgrading from aws-sdk v2 to v3. I'm not sure that'd fix it, but it might, and if it didn't then we'd have a better chance of working with AWS to figure out the root cause.

It'd also be good to catch this error more specifically and log out what's happening in a less cryptic way in this scenario.

Screenshot 2024-07-19 at 17 03 47

I was able to replicate it using a homelab server, k3s and minio. I assume it is something related to the s3 bucket connection

"originalError":{"message":"write EPROTO 08C2C90647750000:error:0A000410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1590:SSL