UnicodeDecodeError on special characters when storing file
pcstout opened this issue · 11 comments
Bug Report
Operating system
Ubuntu 18.04
Client version
1.9.2
Description of the problem
Throws exception when uploading a file where the file path contains special characters.
This is a blocking issue for us.
Repro Script:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import synapseclient
filename = "TestûTest.txt"
with open(filename, mode='w') as f:
f.write('test text')
syn = synapseclient.Synapse()
syn.login()
syn.store(synapseclient.File(path=filename, parent="syn18521874"))
Expected behavior
Does not error. Uploads file.
Actual behavior
Throws exception. Does not upload file.
Traceback (most recent call last):
File "./bug.py", line 14, in <module>
syn.store(synapseclient.File(path=filename, parent="syn18521874"))
File "/home/user/source/.venv/local/lib/python2.7/site-packages/synapseclient/entity.py", line 578, in __init__
kwargs['name'] = utils.guess_file_name(path)
File "/home/user/source/.venv/local/lib/python2.7/site-packages/synapseclient/utils.py", line 243, in guess_file_name
tokens = [x for x in path.split('/') if x != '']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128)
@kimyen I get a different error under Python 3.5.
Same script as above but under 3.5 (this takes many minutes to finally throw the error):
./bug.py
##################################################
Uploading file to Synapse storage
##################################################
Traceback (most recent call last):
File "./bug.py", line 13, in <module>
syn.store(synapseclient.File(path=filename, parent="syn18521874"))
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/client.py", line 971, in store
mimetype=local_state_fh.get('contentType'))
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/upload_functions.py", line 67, in upload_file_handle
return upload_synapse_s3(syn, expanded_upload_path, location['storageLocationId'], mimetype=mimetype)
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/upload_functions.py", line 126, in upload_synapse_s3
file_handle_id = multipart_upload(syn, file_path, contentType=mimetype, storageLocationId=storageLocationId)
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/multipart_upload.py", line 221, in multipart_upload
**kwargs)
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/multipart_upload.py", line 340, in _multipart_upload
storageLocationId=storageLocationId, **kwargs)
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/multipart_upload.py", line 116, in _start_multipart_upload
endpoint=syn.fileHandleEndpoint))
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/client.py", line 3347, in restPOST
exceptions._raise_for_status(response, verbose=self.debug)
File "/home/pstout/tmp/syn_bugs/.venv/lib/python3.5/site-packages/synapseclient/exceptions.py", line 153, in _raise_for_status
raise SynapseHTTPError(message, response=response)
synapseclient.exceptions.SynapseHTTPError: 503 Server Error:
Server error, try again later: javax.servlet.ServletException: javax.servlet.ServletException: org.springframework.web.util.NestedServletException: Request processing failed; nested exception is com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: DD30979C5FB0535C; S3 Extended Request ID: xBCbGi51hoS0/+cicYWopWITw8MYRjM5+WLhQZvOIGT5ZxnZGjlsxYMjK0JNeKLqNnQz7B8u43k=), S3 Extended Request ID: xBCbGi51hoS0/+cicYWopWITw8MYRjM5+WLhQZvOIGT5ZxnZGjlsxYMjK0JNeKLqNnQz7B8u43k=
@pcstout ,
The first bug you encounter on Python 2.7 is a Synapse Python Client bug. I asked about Python version 3 because we are dropping support for Python 2. So if a function work on Python 3, but failed on Python 2, we may not fix it.
The second bug you encounter is a backend error. Can you provide us the bug.py
file or link to it? I need to open a backend ticket for this.
@kimyen Here is bug.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import synapseclient
filename = "TestûTest.txt"
with open(filename, mode='w') as f:
f.write('test text')
syn = synapseclient.Synapse()
syn.login()
syn.store(synapseclient.File(path=filename, parent="syn18521874"))
print('Done.')
Thanks @pcstout for providing the script.
I am running it now and it appears that the client hang. The error you saw indicates that your script continue trying until Synapse is not available and throw a server error. We migrate every week. I am curious if you was testing this at the time we are performing migration. Do you happen to keep a record of when you run into the server error above?
I'm tracking the issue here: https://sagebionetworks.jira.com/browse/SYNPY-963
@kimyen I don't know the exact time but it was yesterday morning that I ran into this issue.
Hi, I'm writing to ask how critical to your application is it to use the file name "TestûTest.txt". We use Amazon S3 for storing files ("objects") and their guidelines for names ("keys") says, in part, to avoid characters in the 128–255 decimal range (û being ascii 251):
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
Elsewhere in Synapse we restrict file name characters to letters, numbers, spaces, underscores, hypens, periods, plus signs, and parentheses. To make Synapse as stable and consistent as possible we would apply this restriction universally. If this restriction would prevent your using Synapse, please help us understand why.
@brucehoff Not critical at all. The team has decided to identify and clean up these file names so this shouldn't be an issue going forward.
I agree with the universal restriction. Hopefully that will cause these files to error out quicker too (right now it takes many minutes to error).
Resolution is tracked in JIRA: https://sagebionetworks.jira.com/browse/PLFM-5510
PLFM-5510 has been closed. This issue should be resolved. Please reopen if it still occurs.