Package.list_object_versions can return keys ending in /, breaking root.set
gdesmarais-ctx opened this issue · 2 comments
In packages.py, set_dir, around line 621, setting a directory for a package calls list_object_versions to get all the objects under the specified dir to add. It is possible to have some of the returned objects end in /. For example, we have sequencer data that is copied into S3 through a storage gateway. Calling list_object_versions on the root directory of the S3 contents returns results like:
from datetime import date, datetime
from quilt3.packages import list_object_versions
import json
objects, _ = list_object_versions('celsius-sequencing', '190828_NB552139_0023_AHKCYJBGXB/')
def json_serial(obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
for i in range(3):
print(f'{json.dumps(objects[i], default=json_serial, indent=2)}')
Results in
{
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
"Size": 0,
"StorageClass": "STANDARD",
"Key": "190828_NB552139_0023_AHKCYJBGXB/",
"VersionId": "L7UoB6tk.T5bH8XHzNWx63ZjgG_KvCBW",
"IsLatest": true,
"LastModified": "2019-08-28T20:12:19+00:00",
"Owner": {
"DisplayName": "aws",
"ID": "5f378d7af9023313f9eb8f0ea138443d2d7629af0efa4c66572dfdb5360dd5c1"
}
}
{
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
"Size": 0,
"StorageClass": "STANDARD",
"Key": "190828_NB552139_0023_AHKCYJBGXB/Config/",
"VersionId": "7TLUsbBstpl.ei8TeSbe1GZY_mudfTXI",
"IsLatest": true,
"LastModified": "2019-08-28T20:12:35+00:00",
"Owner": {
"DisplayName": "aws",
"ID": "5f378d7af9023313f9eb8f0ea138443d2d7629af0efa4c66572dfdb5360dd5c1"
}
}
{
"ETag": "\"8048e95a2c72097c274ccbdce9115ebb\"",
"Size": 264379,
"StorageClass": "STANDARD",
"Key": "190828_NB552139_0023_AHKCYJBGXB/Config/Effective.cfg",
"VersionId": "3vuUg9PqKDWlCyWfk8jbKyEqw2iF4Qk5",
"IsLatest": true,
"LastModified": "2019-08-28T20:13:07+00:00",
"Owner": {
"DisplayName": "aws",
"ID": "5f378d7af9023313f9eb8f0ea138443d2d7629af0efa4c66572dfdb5360dd5c1"
}
}
when root.set is called with the second item, it raises an exception around:
if not logical_key or logical_key.endswith('/'):
raise QuiltException(
f"Invalid logical key {logical_key!r}. "
f"A package entry logical key cannot be a directory."
)
We need to be able to add these files. Currently, I have a patch in place that just ignores the QuiltException. Obviously not ideal.
Thanks for the detailed bug report. We'll circle back with a fix.
Adding in a dump of the offending bucket/key structure. I used the following little script to generate the file:
import boto3
import json
from datetime import datetime, date
from quilt3.packages import list_object_versions
def json_serial(obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
s3_client = boto3.client('s3')
src_bucket = 'celsius-sequencing'
src_key = '190828_NB552139_0023_AHKCYJBGXB'
obj_report_v2 = s3_client.list_objects_v2(Bucket=src_bucket, Prefix=src_key)
obj_report_json_v2 = json.dumps(obj_report_v2, default=json_serial, indent=2)
with open('obj_report_v2.json', 'w') as f:
f.write(obj_report_json_v2)
# Can't do this - throws exception
# obj_report_q = list_object_versions(src_bucket, src_key)
# obj_report_json_q = json.dumps(obj_report_q, default=json_serial, indent=2)
# print(f'{obj_report_json_q}')