awslabs/mountpoint-s3

Support accessing previous versions of objects

Opened this issue · 4 comments

Tell us more about this new feature.

We have a use case in which we need to process customers' S3 objects using a third-party tool that only operates on POSIX-like files. We're using mountpoint-s3 very successfully to let us use this tool on potentially very large S3 objects, without having to download them locally.

However in some cases we want to perform this processing on older versions of those objects. I don't see a way to access older versions in the FUSE filesystem as presented. So today we revert to the S3 REST APIs to discover these older versions, and we would have to download the objects locally to a temp directory in order to process them. This is not ideal, particularly since this tool very seldom needs to actually see every byte in the file in order to do its work.

I'm curious if there are any plans to provide a way to access older versions of objects. Maybe with some hack like a .previous_versions/ directory at the root of the mount, with each object represented as a path under that directory but the object key is itself a directory, and within it is one "file" per version named with the S3 object version key. That's just the first idea that comes to mind, we're not picky as to the details as long as we can surface prior versions of objects as POSIX-like files.

Thanks for creating this issue - I just wanted to get a few clarifications. Do you need to be able to view multiple versions at once, or is a point in time view fine? If so, is the timestamp known at mount time?

In this scenario we would need to be able to see all versions of an object. It's not critical for us that this include new versions created since the mount, but being able to access just a single non-current version of an object isn't enough for our use case.

Thanks for sharing the use case, Adam! I see where the need to access multiple versions may be coming from.

We don't have any plans to support this right now but I'll leave the issue open so we can gauge interest (through 👍 reactions to the issue).

As an aside, if you didn't need multiple versions you could try using Amazon S3 Object Lambda to provide a view into your bucket with Mountpoint. There's a blog published in October 2024 covering that use case, although it would require you to create the access point knowing the point in time you want to view in advance: https://aws.amazon.com/blogs/storage/access-a-point-in-time-with-amazon-s3-object-lambda/

Thanks Danny for the suggestion.

I'm afraid that solution would be a problem for us for reasons of cost. If the bucket has millions of objects in it, the addition of Object Lambda plus a database would dramatically increase the cost to process a bucket.