treeverse/lakeFS

BUG: Multipart upload: mtime difference between storage and lakeFS can be substantial

N-o-Z opened this issue · 1 comments

For example: In S3 the mtime is determined when creating the multipart upload requests, while lakeFS mtime is determined upon completion of multipart upload.
Needless to say this can result in a very huge diff between the S3 mtime and lakeFS mtime.

Need to find a generic solution to this which will be valid for all storage adapters
Possible solution:
Upon CompleteMultipartUpload, stat the object on the blockstore and use the mtime to create the lakeFS entry.

In order to properly test this - we need to consider adding a head object interface to our block adapter.

Fortunately we can do this: GCS and for Azure return this information. S3 does not, but we already headObject the generated object to gets its ETag, after which Last-Modified time is free (and guaranteed to be found).

Probably also want to straighten this out for put-object: any difference can be unpleasant for presigned, and generally confusing.