benbjohnson/litestream

Switch to HEAD instead of get-bucket-location in S3 init routine

Closed this issue · 2 comments

AWS documents that HEAD should be used in favour of the get-bucket-location API call that litestream currently uses to determine which S3 endpoint / region to connect to: https://docs.aws.amazon.com/cli/latest/reference/s3api/get-bucket-location.html

We recommend that you use HeadBucket to return the Region that a bucket resides in. For backward compatibility, Amazon S3 continues to support GetBucketLocation.

Unfortunately the backwards compatibility support appears to be a bit spotty - in partiuclar, there are many reports to the get-bucket-location API call failing to reliably work cross-region - the very situation where it is being used... e.g.
aws/aws-sdk-go#720
cloud-custodian/cloud-custodian#7695

I have a role account running under ECS that is hitting this. I know the role account permissions are fine, because litestream works if I explicitly configure the region in the config file; but when trying to use litestream without a config file on the command-line it all falls apart:

/ # litestream restore -o test.db s3://BUCKET/PATH
2024/03/21 06:42:23 ERROR failed to run error="cannot fetch generations: cannot lookup bucket region: AccessDenied: Access Denied\n\tstatus code: 403, request id: Z9QYDYMRKTC8SRPB, host id: ..."

Which seems to match with the problematic get-bucket-location call being bypassed when a region is specified in the config per

region := c.Region
if region == "" {
if c.Endpoint == "" {
if region, err = c.findBucketRegion(ctx, c.Bucket); err != nil {
return fmt.Errorf("cannot lookup bucket region: %w", err)
}
} else {
region = DefaultRegion // default for non-S3 object stores
}
}
, but unfortunately I cannot see/find any way to set a similar config option from the command-line only...

Attempting to replicate the behaviour with the raw aws CLI shows the issue:

/ # aws s3api get-bucket-location --bucket BUCKET --output text | cat
ap-southeast-2
/ # AWS_REGION=us-east-1 aws s3api get-bucket-location --bucket BUCKET --output text | cat

An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied

It's not that litestream is doing anything "wrong" here (other than using an old/not recommended API)... just the API is unreliable and stupid.

The recommended head-bucket call does seem to reliably work however:

/ # aws s3api head-bucket --bucket BUCKET --debug 2>&1 | grep x-amz-bucket-region
2024-03-21 07:21:13,247 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-id-2': 'viRHfLso2hnlc/saJECbtrIgrQfOWI4nDxff7QCsYUWEOucXO42Xkm8DdtsBzCz2XwFyLAut98KUvUPpRDmKZA==', 'x-amz-request-id': 'DJ92Q5D6A8717KPB', 'Date': 'Thu, 21 Mar 2024 07:21:14 GMT', 'x-amz-bucket-region': 'ap-southeast-2', 'x-amz-access-point-alias': 'false', 'Content-Type': 'application/xml', 'Server': 'AmazonS3'}

/ # aws s3api head-bucket --bucket BUCKET --endpoint https://s3.us-east-1.amazonaws.com/ --debug 2>&1| grep x-amz-bucket-region
2024-03-21 07:23:20,040 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-bucket-region': 'ap-southeast-2', 'x-amz-request-id': 'ZKZBSKCGY78EXRBC', 'x-amz-id-2': 'mPQgYi9PPQj/fW3S+gJ0qBdttaj2db9DBjdjk3/RhIJL95T9/yA2w5KUGvFNOQldZWoXtMeQNFM=', 'Content-Type': 'application/xml', 'Date': 'Thu, 21 Mar 2024 07:23:19 GMT', 'Server': 'AmazonS3'}

Any objections to changing to head instead?

I'm happy to send a PR if this change would be welcome.

hifi commented

Since it's only used against real AWS aka when the endpoint is empty I believe the change to use the currently recommended way would be a good update. Please submit a PR if you have the time.

Thanks!

Thanks,

I worked out I can workaround the issue for now by using env vars in the config file:

addr: ":9090"

dbs:
  - path: ${DB_PATH}
    replicas:
      - url: ${REPLICA_URL}
        region: ${REPLICA_REGION}

and then run litestream with:

DB_PATH="/path/to/db" REPLICA_URL="s3://BUCKET/path" REPLICA_REGION=BUCKET_REGION litestream restore -if-replica-exists "/path/to/db"

a bit messy, but it works :)

I would still like to do the PR to tidy this up, but it's unlikely to get to the top of my TODO list for a few weeks now I have a workaround.