Sage-Bionetworks/synapsePythonClient

Endpoint URL not utilized for external S3 resources

kellrott opened this issue · 3 comments

Bug Report

Github issues is reserved for bug report. If you have a question, please don't use this form.
Instead, please ask your question on the Synapse Help Forum.

Operating system

Linux

Client version

Output of:

$ synapse store --parentId syn27256137 b8165ee8-a444-4e79-b5e8-162da70b1815.tar.gz

##################################################
This Synapse Project has transitioned to use storage maintained at the NCI Genomic Data Commons (GDC). GDC credentials are required for accessing files. Please contact the CCG Program Office to request GDC credentials
Uploading to endpoint: [https://gdc-jamboree-objstore.datacommons.io] bucket: [gdc-alch-jamboree]
##################################################


S3UploadFailedError: Failed to upload b8165ee8-a444-4e79-b5e8-162da70b1815.tar.gz to gdc-alch-jamboree/ab067ed0-ccc6-4361-9c2e-6544249fe1cb/b8165ee8-a444-4e79-b5e8-162da70b1815.tar.gz: An error occurred (InvalidRequest) when calling the CreateMultipartUpload operation: Invalid canned ACL

Description of the problem

  • Attempted upload to private S3 endpoint, credentials failed
  • Download from the same project/custom endpoint works, so credentials are not the issue

Expected behavior

  • ability to upload

Actual behavior

  • ACL failure

Based on reading of the code, it would appear that the issues is at
https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/core/upload/upload_functions.py#L198

The call to create the upload function:

def upload_fn(credentials):
        return S3ClientWrapper.upload_file(
            bucket_name,
            None,
            remote_file_key,
            local_path,
            credentials=credentials,
            transfer_config_kwargs={'max_concurrency': syn.max_threads}
        )

Has the second argument, the endpoint_url, hard coded to None. This needs to be configured, the same way it is done at https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/client.py#L1830 where the S3ClientWrapper.download_file is provided the endpoint_url from the file handle.

Thanks for the report @kellrott . This is a known issue and this is the issue:

ExtraArgs={'ACL': 'bucket-owner-full-control'},
.

Basically this was added because there was an issue that the owner of the S3 bucket actually didn't have access to the S3 objects uploaded into the buckets.

That being said, this particular canned ACL is not supported on the IBM buckets. Currently unsure of the resolution other than pointing people to use older verisons of the synapseclient (2.3.1). (unfortunately...)

I had been using older versions of synapseclient (2.3.1) but recently an attempt to install the older version is failing. I'm not sure which dependency(ies) have a new releases that causes conda and pip to fail to install. Do you have record of the version of python, pandas, boto, etc that are compatible with 2.3.1?

@JenniferShelton Apologies, I must have missed this message!

Python==3.8,3.9
pandas>=0.25.0,<1.5
boto3>=1.7.0,<2.0

We just onboarded an engineer to help with the client, I'll be sure to re-visit this issue again as we will eventually reach a point where Python 3.9 EOL. For more transparency, here is the Jira ticket internally to track this work: https://sagebionetworks.jira.com/browse/SYNPY-1198