elastio/ssstar

Capture all object metadata in tar stream

Opened this issue · 2 comments

Currently we capture only the object's key, and its data. But to be useful for a general-purpose backup tool, we need to preserve object metadata as well. Including, but not limited to:

  • tags
  • access control lists (ACLs)
  • user-defined metadata
  • original creation date
  • version ID

When restoring, the default behavior should be to restore all possible metadata (unfortunately it's not possible to preserve the original create date or the version ID), with an option to restore only specific metadata components instead.

This should be doable by storing our custom metadata in the tar archive in a separate "file" which appears in the archive before the actual object. We can make this file hidden and append a suffix like .$$metadata or something to ensure it's not confused for a real object. The extract stage would need to be modified to handle this, but that complexity would be hidden completely from the public API.

@kostiantyn-povnych could this metadata be stored in the ScaleZ index, as an alternative or in addition to storing it here?

Yes, it could and, in my opinion, should be stored in the ScaleZ index.
Moreover, some of the fields you listed are already supported:

  • version id is the internal key for object versions in version S3 bucket metadata which is transformed into FS index
  • original creation date: captured and stored for all backup types
  • user-defined metadata: Not yet supported, and I don't see how the user could inject some custom metadata for individual files in File backup.
  • tags: Not done yet but can be easily implemented by extension of the FS metadata definitions.