eWaterCycle/era5cli

Protection against Connection reset by peer

Closed this issue · 2 comments

I almost had a 16.9Gb downloaded, but at 16Gb the era5cli raised the following exceptions:

021-07-12 14:58:58,311 INFO Downloading https://download-0008.copernicus-climate.eu/cache-compute-0008/cache/data8/adaptor.mars.internal-1625760063.1087496-28596-14-063440c3-d9cd-40e8-9a5f-548b0a156e2b.nc to era5_2m_temperature_1990_hourly.nc (16.9G)
2021-07-12 15:49:23,960 INFO Download rate 5.7M/s                                                                                                     
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████▊      | 16.0G/16.9G [50:25<02:07, 8.15MB/sTraceback (most recent call last):                                                                                                                     
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/urllib3/response.py", line 436, in _error_catcher
    yield
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/urllib3/response.py", line 518, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/http/client.py", line 455, in read
    n = self.readinto(b)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/http/client.py", line 499, in readinto
    n = self.fp.readinto(b)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/requests/models.py", line 751, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/urllib3/response.py", line 575, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/urllib3/response.py", line 540, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/urllib3/response.py", line 454, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/verhoes/miniconda39/envs/ewatercycle/bin/era5cli", line 8, in <module>
    sys.exit(main())
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/era5cli/cli.py", line 423, in main
    _execute(args)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/era5cli/cli.py", line 415, in _execute
    era5.fetch(dryrun=args.dryrun)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/era5cli/fetch.py", line 160, in fetch
    self._split_variable_yr()
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/era5cli/fetch.py", line 225, in _split_variable_yr
    pool.map(self._getdata, variables, years, outputfiles)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/pathos/threading.py", line 136, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/multiprocess/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/multiprocess/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/era5cli/fetch.py", line 401, in _getdata
    connection.retrieve(name, request, outputfile)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/cdsapi/api.py", line 350, in retrieve
    result.download(target)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/cdsapi/api.py", line 173, in download
    return self._download(self.location, self.content_length, target)
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/cdsapi/api.py", line 133, in _download
    for chunk in r.iter_content(chunk_size=1024):
  File "/home/verhoes/miniconda39/envs/ewatercycle/lib/python3.9/site-packages/requests/models.py", line 754, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
sys.meta_path is None, Python is likely shutting down

Can era5cli catch this exception and try again? So I don't have to start the download from the start again.

Possible overlap with #76

Hi Stefan, do you think we can catch this on line 445 in the follow file?

era5cli/era5cli/fetch.py

Lines 432 to 446 in 3fcbe8e

def _getdata(self, variables: list, years: list, outputfile: str):
"""Fetch variables using cds api call."""
name, request = self._build_request(variables, years)
if self.dryrun:
print(name, request, outputfile)
else:
queueing_message = (
os.linesep, "Download request is being queued at Copernicus.",
os.linesep,
"It can take some time before downloading starts, ",
"please do not kill this process in the meantime.", os.linesep)
connection = cdsapi.Client()
print("".join(queueing_message)) # print queueing message
connection.retrieve(name, request, outputfile)
era5cli.utils._append_history(name, request, outputfile)

With something like:

try:
    connection.retrieve(name, request, outputfile)
except ChunkedEncodingError:
    print("Possible loss of connection. Retrying once")
    sleep(2)
    connection.retrieve(name, request, outputfile)  # try again

I am not sure how to reliably catch the ConnectionResetError at the top of the stack trace.

Seems the current version of cdsapi already does retries at https://github.com/ecmwf/cdsapi/blob/f3b94a9bc40f8d56b0d1ac8cc8bc84765509ef05/cdsapi/api.py#L604

The code of the stacktrace was from <0.2.3 while current era5cli needs 0.4.0, so I think this issue has been resolved.