gsutil runs into socket timeout with -m options
MichaelJThomas-2016 opened this issue · 0 comments
MichaelJThomas-2016 commented
Hi,
I am trying to rsync a bucket from gcs -> aws via gsutil.
I am using composer to schedule a bash script that runs:
set -e;
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY};
sudo apt-get update -y && sudo apt-get install google-cloud-cli -y # This in itself is an issue with the python runtime on GKE
gsutil -o "GSUtil:max_upload_compression_buffer_size=8G" -m rsync -r gs://MY-BUCKET/MY_PREFIX/year={{execution_date.year}}/month={{execution_date.strftime('%m')}}/day={{execution_date.strftime('%d')}} \
s3://MY-BUCKET/MY_PREFIX/year={{execution_date.year}}/month={{execution_date.strftime('%m')}}/day={{execution_date.strftime('%d')}}
If I remove the -m option, composer fails out - An issue i should ask them about - but a few files upload. If I leave the -m I get:
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - Traceback (most recent call last):
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/threading.py", line 980, in _bootstrap_inner
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - self.run()
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/threading.py", line 917, in run
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - self._target(*self._args, **self._kwargs)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/daisy_chain_wrapper.py", line 189, in PerformDownload
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - self.gsutil_api.GetObjectMedia(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 352, in GetObjectMedia
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return self._GetApi(provider).GetObjectMedia(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1244, in GetObjectMedia
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return self._PerformDownload(bucket_name,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1383, in _PerformDownload
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - apitools_download.GetRange(additional_headers=additional_headers,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 485, in GetRange
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - response = self.__GetChunk(progress, end_byte,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 418, in __GetChunk
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return http_wrapper.MakeRequest(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 359, in MakeRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - retry_func(ExceptionRetryArgs(http, http_request, e, retry,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/retry_util.py", line 84, in RetriesInDataTransferHandler
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - http_wrapper.RethrowExceptionHandler(retry_args)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 348, in MakeRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return _MakeRequestNoRetry(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 397, in _MakeRequestNoRetry
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - info, content = http.request(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 544, in NewRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return request_orig(uri, method=method, body=body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/oauth2client/oauth2client/transport.py", line 173, in new_request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - resp, content = request(orig_request_method, uri, method, body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/oauth2client/oauth2client/transport.py", line 280, in request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return http_callable(uri, method=method, body=body, headers=headers,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/httplib2/python3/httplib2/__init__.py", line 1701, in request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - (response, content) = self._request(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 452, in OverrideRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - (response, content) = self._conn_request(conn, request_uri, method, body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 685, in _conn_request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - new_data = http_stream.read(TRANSFER_BUFFER_SIZE)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 403, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - data = orig_read_func(amt)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/http/client.py", line 463, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - n = self.readinto(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/http/client.py", line 507, in readinto
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - n = self.fp.readinto(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/socket.py", line 704, in readinto
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return self._sock.recv_into(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/ssl.py", line 1242, in recv_into
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return self.read(nbytes, buffer)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/ssl.py", line 1100, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - return self._sslobj.read(len, buffer)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - socket.timeout: The read operation timed out
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - The read operation timed out
Not exactly sure if its on the AWS end or not, but any help would be appreciated.