Azure/azure-storage-python

create_file_from_path of fileservice overwrites SMBproperties with defaults

vladm opened this issue · 0 comments

vladm commented

Which service(blob, file, queue) does this issue concern?

azure.storage.file

Which version of the SDK was used? Please provide the output of pip freeze.

2.1.0

What problem was encountered?

I'm using create_file_from_path from azure/storage/file/fileservice.py to create a small test file. I'm passing last_write_time and create_time via SMBProperties as I need to preserve those values.

However file is created with current date/time stamp, when examined via Azure Storage Explorer.

Looking at the log, I can see that for some reason, there are two PUT requests made:

1st PUT that passes the correct last_write_time and create_tim via x-ms-file-creation-time and x-ms-file-last-write-time header fields.
Based on the call stack, I can see it's coming via create_file call within create_file_from_stream.

2nd PUT that strips off all the SMBProperties and does not include corresponding fileds in the HTTP PUT request coming from within process_chunk.

Here is full log for your reference:

[2020-12-08 16:06:05,864] {bbm_sftp2afs.py:154} INFO - Uploading /tmp/tmpl13zcmod to afs://airflowtest/postlogs as sample.csv and modification time 2020-05-08T09:49:58.0000000Z
[2020-12-08 16:06:05,889] {storageclient.py:331} INFO - Client-Request-ID=91851bce-39a1-11eb-9ac4-50e549edaf57 Outgoing request: Method=PUT, Path=/airflowtest/postlogs/sample.csv, Query={'timeout': None},
Headers={'x-ms-content-length': '8737', 'x-ms-type': 'file', 'x-ms-file-permission': 'Inherit', 'x-ms-file-attributes': 'Archive', 'x-ms-file-creation-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-last-write-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-permission-key': None, 'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.9; Linux 4.4.0-19041-Microsoft)', 'x-ms-client-request-id': '91851bce-39a1-11eb-9ac4-50e549edaf57', 'x-ms-date': 'Tue, 08 Dec 2020 22:06:05 GMT', 'Authorization': 'REDACTED'}.
[2020-12-08 16:06:05,890] {storageclient.py:332} INFO - Outgoing request STACK:
[2020-12-08 16:06:05,898] {storageclient.py:334} INFO - File "bin/airflow", line 37, in
args.func(args)
[2020-12-08 16:06:05,898] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/cli.py", line 80, in wrapper
return f(*args, **kwargs)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 580, in run
_run(args, dag, ti)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 476, in _run
run_job.run()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/jobs/base_job.py", line 218, in run
self._execute()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 94, in _execute
self.task_runner.start()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/task/task_runner/standard_task_runner.py", line 43, in start
self.process = self._start_by_fork()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/task/task_runner/standard_task_runner.py", line 86, in _start_by_fork
args.func(args, dag=self.dag)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/cli.py", line 80, in wrapper
return f(*args, **kwargs)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 580, in run
_run(args, dag, ti)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 481, in _run
pool=args.pool,
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(args, **kwargs)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "/home/vlad/airflow/plugins/operators/bbm_sftp2afs.py", line 157, in execute
creation_time=self.afs_load_options['creation_time'])
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/contrib/hooks/azure_fileshare_hook.py", line 172, in load_file
file_name, file_path, **kwargs)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 1943, in create_file_from_path
max_connections, file_permission=file_permission, smb_properties=smb_properties, timeout=timeout)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 2134, in create_file_from_stream
timeout=timeout
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 1888, in create_file
self._perform_request(request)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request
for line in traceback.format_stack():
[2020-12-08 16:06:06,142] {storageclient.py:357} INFO - Client-Request-ID=91851bce-39a1-11eb-9ac4-50e549edaf57 Receiving Response: Server-Timestamp=Tue, 08 Dec 2020 22:06:11 GMT, Server-Request-ID=62886ed3-701a-0065-50ae-cd92db000000, HTTP Status Code=201, Message=Created, Headers={'content-length': '0', 'last-modified': 'Fri, 08 May 2020 09:49:58 GMT', 'etag': '"0x8D7F3352B362700"', 'server': 'Windows-Azure-File/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '62886ed3-701a-0065-50ae-cd92db000000', 'x-ms-client-request-id': '91851bce-39a1-11eb-9ac4-50e549edaf57', 'x-ms-version': '2019-02-02', 'x-ms-file-change-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-last-write-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-creation-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-permission-key': '10699711666165613676
4385920356675382498', 'x-ms-file-attributes': 'Archive', 'x-ms-file-id': '13835064652351930368', 'x-ms-file-parent-id': '13835128424026341376', 'x-ms-request-server-encrypted': 'true', 'date': 'Tue, 08 Dec 2020 22:06:11 GMT'}.

[2020-12-08 16:06:06,143] {storageclient.py:331} INFO - Client-Request-ID=91ac43ac-39a1-11eb-af20-50e549edaf57 Outgoing request: Method=PUT, Path=/airflowtest/postlogs/sample.csv, Query={'comp': 'range', 'timeout': None}, Headers={'x-ms-write': 'update', 'x-ms-range': 'bytes=0-8736', 'Content-Length': '8737', 'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.9; Linux 4.4.0-19041-Microsoft)', 'x-ms-client-request-id': '91ac43ac-39a1-11eb-af20-50e549edaf57', 'x-ms-date': 'Tue, 08 Dec 2020 22:06:06 GMT', 'Authorization': 'REDACTED'}.
[2020-12-08 16:06:06,143] {storageclient.py:332} INFO - Outgoing request STACK:
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/concurrent/futures/thread.py", line 69, in _worker
work_item.run()
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/_upload_chunking.py", line 82, in process_chunk
return self._upload_chunk_with_progress(chunk_offset, chunk_data)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/_upload_chunking.py", line 129, in _upload_chunk_with_progress
timeout=self.timeout
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 2691, in update_range
self._perform_request(request)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request
for line in traceback.format_stack():
[2020-12-08 16:06:06,194] {storageclient.py:357} INFO - Client-Request-ID=91ac43ac-39a1-11eb-af20-50e549edaf57 Receiving Response: Server-Timestamp=Tue, 08 Dec 2020 22:06:11 GMT, Server-Request-ID=62886ed7-701a-0065-51ae-cd92db000000, HTTP Status Code=201, Message=Created, Headers={'content-length': '0', 'content-md5': 'e8Xp61MKK0lkStIaa2iwuw==', 'last-modified': 'Tue, 08 Dec 2020 22:06:11 GMT', 'etag': '"0x8D89BC578FC2B35"', 'server': 'Windows-Azure-File/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '62886ed7-701a-0065-51ae-cd92db000000', 'x-ms-client-request-id': '91ac43ac-39a1-11eb-af20-50e549edaf57', 'x-ms-version': '2019-02-02', 'x-ms-request-server-encrypted': 'true', 'date': 'Tue, 08 Dec 2020 22:06:11 GMT'}.