Capture all object metadata in tar stream
Opened this issue · 2 comments
Currently we capture only the object's key, and its data. But to be useful for a general-purpose backup tool, we need to preserve object metadata as well. Including, but not limited to:
- tags
- access control lists (ACLs)
- user-defined metadata
- original creation date
- version ID
When restoring, the default behavior should be to restore all possible metadata (unfortunately it's not possible to preserve the original create date or the version ID), with an option to restore only specific metadata components instead.
This should be doable by storing our custom metadata in the tar
archive in a separate "file" which appears in the archive before the actual object. We can make this file hidden and append a suffix like .$$metadata
or something to ensure it's not confused for a real object. The extract
stage would need to be modified to handle this, but that complexity would be hidden completely from the public API.
@kostiantyn-povnych could this metadata be stored in the ScaleZ index, as an alternative or in addition to storing it here?
Yes, it could and, in my opinion, should be stored in the ScaleZ index.
Moreover, some of the fields you listed are already supported:
version id
is the internal key for object versions in version S3 bucket metadata which is transformed into FS indexoriginal creation date
: captured and stored for all backup typesuser-defined metadata
: Not yet supported, and I don't see how the user could inject some custom metadata for individual files in File backup.tags
: Not done yet but can be easily implemented by extension of the FS metadata definitions.