BUG: Multipart upload: mtime difference between storage and lakeFS can be substantial
N-o-Z opened this issue · 1 comments
For example: In S3 the mtime is determined when creating the multipart upload requests, while lakeFS mtime is determined upon completion of multipart upload.
Needless to say this can result in a very huge diff between the S3 mtime and lakeFS mtime.
Need to find a generic solution to this which will be valid for all storage adapters
Possible solution:
Upon CompleteMultipartUpload, stat the object on the blockstore and use the mtime to create the lakeFS entry.
In order to properly test this - we need to consider adding a head object interface to our block adapter.
Fortunately we can do this: GCS and for Azure return this information. S3 does not, but we already headObject the generated object to gets its ETag, after which Last-Modified time is free (and guaranteed to be found).
Probably also want to straighten this out for put-object: any difference can be unpleasant for presigned, and generally confusing.