awslabs/mountpoint-s3

Reading a file creates a Bucket GET request

Opened this issue · 2 comments

Mountpoint for Amazon S3 version

mount-s3 1.8.0

AWS Region

us-east-1

Describe the running environment

Locally running mountpoint s3 against an AWS bucket using creds stored in the environment variables. Using Ubuntu 20.04

Mountpoint options

mount-s3 <bucket> <file_dir> --endpoint-url <endpoint> --read-only --force-path-style --auto-unmount --prefix <prefix>

What happened?

I'm trying to fetch thousands of really small files and I realized I was getting rate limited due to a rate limit imposed on the number of Bucket GET requests which are basically operations from mountpoint s3 trying to read the location per each file read.

I am not listing the contents of the bucket. I know what the file names are beforehand and are just reading each location without listing the folder. However, I am getting as many Bucket GET requests as Object GET and Object HEAD requests.

Is this expected? Is there a way to not list the folder every time I do a get request?

Relevant log output

No response

vladem commented

Hi, thank you for opening the issue. I assume that you see unexpected ListObjectsV2 requests and to those you refer with "Bucket GET requests". You've mentioned getting rate limited, are you getting 503 errors on ListObjectsV2 requests?

Before reading the file, Mountpoint will make both the ListObjectsV2 and HeadObject requests for the specified path. This mechanism is ensuring the shadowing semantics (e.g. directory dir/ "shadows" the file dir).

There were related issues opened previously:

You may avoid repeated ListObjectsV2 operations for the given file by using --metadata-ttl <SECONDS>. Also I see that you're already using the --prefix argument, choosing a longer prefix may reduce the number of ListObjectsV2 in case of nested directories.

Lyon77 commented

Ah I see. I missed the previous issues. I'll give a shot with --metadata-ttl indefinite and see if it helps. Thanks!