create_file_from_path of fileservice overwrites SMBproperties with defaults
vladm opened this issue · 0 comments
Which service(blob, file, queue) does this issue concern?
azure.storage.file
Which version of the SDK was used? Please provide the output of pip freeze
.
2.1.0
What problem was encountered?
I'm using create_file_from_path from azure/storage/file/fileservice.py to create a small test file. I'm passing last_write_time and create_time via SMBProperties as I need to preserve those values.
However file is created with current date/time stamp, when examined via Azure Storage Explorer.
Looking at the log, I can see that for some reason, there are two PUT requests made:
1st PUT that passes the correct last_write_time and create_tim via x-ms-file-creation-time and x-ms-file-last-write-time header fields.
Based on the call stack, I can see it's coming via create_file call within create_file_from_stream.
2nd PUT that strips off all the SMBProperties and does not include corresponding fileds in the HTTP PUT request coming from within process_chunk.
Here is full log for your reference:
[2020-12-08 16:06:05,864] {bbm_sftp2afs.py:154} INFO - Uploading /tmp/tmpl13zcmod to afs://airflowtest/postlogs as sample.csv and modification time 2020-05-08T09:49:58.0000000Z
[2020-12-08 16:06:05,889] {storageclient.py:331} INFO - Client-Request-ID=91851bce-39a1-11eb-9ac4-50e549edaf57 Outgoing request: Method=PUT, Path=/airflowtest/postlogs/sample.csv, Query={'timeout': None},
Headers={'x-ms-content-length': '8737', 'x-ms-type': 'file', 'x-ms-file-permission': 'Inherit', 'x-ms-file-attributes': 'Archive', 'x-ms-file-creation-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-last-write-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-permission-key': None, 'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.9; Linux 4.4.0-19041-Microsoft)', 'x-ms-client-request-id': '91851bce-39a1-11eb-9ac4-50e549edaf57', 'x-ms-date': 'Tue, 08 Dec 2020 22:06:05 GMT', 'Authorization': 'REDACTED'}.
[2020-12-08 16:06:05,890] {storageclient.py:332} INFO - Outgoing request STACK:
[2020-12-08 16:06:05,898] {storageclient.py:334} INFO - File "bin/airflow", line 37, in
args.func(args)
[2020-12-08 16:06:05,898] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/cli.py", line 80, in wrapper
return f(*args, **kwargs)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 580, in run
_run(args, dag, ti)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 476, in _run
run_job.run()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/jobs/base_job.py", line 218, in run
self._execute()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 94, in _execute
self.task_runner.start()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/task/task_runner/standard_task_runner.py", line 43, in start
self.process = self._start_by_fork()
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/task/task_runner/standard_task_runner.py", line 86, in _start_by_fork
args.func(args, dag=self.dag)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/cli.py", line 80, in wrapper
return f(*args, **kwargs)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 580, in run
_run(args, dag, ti)
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 481, in _run
pool=args.pool,
[2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(args, **kwargs)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "/home/vlad/airflow/plugins/operators/bbm_sftp2afs.py", line 157, in execute
creation_time=self.afs_load_options['creation_time'])
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/contrib/hooks/azure_fileshare_hook.py", line 172, in load_file
file_name, file_path, **kwargs)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 1943, in create_file_from_path
max_connections, file_permission=file_permission, smb_properties=smb_properties, timeout=timeout)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 2134, in create_file_from_stream
timeout=timeout
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 1888, in create_file
self._perform_request(request)
[2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request
for line in traceback.format_stack():
[2020-12-08 16:06:06,142] {storageclient.py:357} INFO - Client-Request-ID=91851bce-39a1-11eb-9ac4-50e549edaf57 Receiving Response: Server-Timestamp=Tue, 08 Dec 2020 22:06:11 GMT, Server-Request-ID=62886ed3-701a-0065-50ae-cd92db000000, HTTP Status Code=201, Message=Created, Headers={'content-length': '0', 'last-modified': 'Fri, 08 May 2020 09:49:58 GMT', 'etag': '"0x8D7F3352B362700"', 'server': 'Windows-Azure-File/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '62886ed3-701a-0065-50ae-cd92db000000', 'x-ms-client-request-id': '91851bce-39a1-11eb-9ac4-50e549edaf57', 'x-ms-version': '2019-02-02', 'x-ms-file-change-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-last-write-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-creation-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-permission-key': '106997116661656136764385920356675382498', 'x-ms-file-attributes': 'Archive', 'x-ms-file-id': '13835064652351930368', 'x-ms-file-parent-id': '13835128424026341376', 'x-ms-request-server-encrypted': 'true', 'date': 'Tue, 08 Dec 2020 22:06:11 GMT'}.
[2020-12-08 16:06:06,143] {storageclient.py:331} INFO - Client-Request-ID=91ac43ac-39a1-11eb-af20-50e549edaf57 Outgoing request: Method=PUT, Path=/airflowtest/postlogs/sample.csv, Query={'comp': 'range', 'timeout': None}, Headers={'x-ms-write': 'update', 'x-ms-range': 'bytes=0-8736', 'Content-Length': '8737', 'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.9; Linux 4.4.0-19041-Microsoft)', 'x-ms-client-request-id': '91ac43ac-39a1-11eb-af20-50e549edaf57', 'x-ms-date': 'Tue, 08 Dec 2020 22:06:06 GMT', 'Authorization': 'REDACTED'}.
[2020-12-08 16:06:06,143] {storageclient.py:332} INFO - Outgoing request STACK:
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/concurrent/futures/thread.py", line 69, in _worker
work_item.run()
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/_upload_chunking.py", line 82, in process_chunk
return self._upload_chunk_with_progress(chunk_offset, chunk_data)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/_upload_chunking.py", line 129, in _upload_chunk_with_progress
timeout=self.timeout
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 2691, in update_range
self._perform_request(request)
[2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request
for line in traceback.format_stack():
[2020-12-08 16:06:06,194] {storageclient.py:357} INFO - Client-Request-ID=91ac43ac-39a1-11eb-af20-50e549edaf57 Receiving Response: Server-Timestamp=Tue, 08 Dec 2020 22:06:11 GMT, Server-Request-ID=62886ed7-701a-0065-51ae-cd92db000000, HTTP Status Code=201, Message=Created, Headers={'content-length': '0', 'content-md5': 'e8Xp61MKK0lkStIaa2iwuw==', 'last-modified': 'Tue, 08 Dec 2020 22:06:11 GMT', 'etag': '"0x8D89BC578FC2B35"', 'server': 'Windows-Azure-File/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '62886ed7-701a-0065-51ae-cd92db000000', 'x-ms-client-request-id': '91ac43ac-39a1-11eb-af20-50e549edaf57', 'x-ms-version': '2019-02-02', 'x-ms-request-server-encrypted': 'true', 'date': 'Tue, 08 Dec 2020 22:06:11 GMT'}.