[BUG] Stuck on download when network failure occurs
Opened this issue · 1 comments
Describe the bug
When a network error occurs (read timeout), dsnap is not able to recover when finishing the downloads and is stuck at downloading of last X blocks. This is most likely caused by some lock mechanism where it thinks the errored thread is still running and downloading those few last blocks but the thread already exited with an exception (my theory).
Note the number of network errors in log is much greater (10+) then the count of missing blocks remaining for downloads.
...
Exception in thread Thread-3 (<lambda>):
Traceback (most recent call last):
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher
yield
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/urllib3/response.py", line 567, in read
data = self._fp_read(amt) if not fp_closed else b""
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/urllib3/response.py", line 533, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "/usr/lib/python3.10/http/client.py", line 482, in read
s = self._safe_read(self.length)
File "/usr/lib/python3.10/http/client.py", line 631, in _safe_read
data = self.fp.read(amt)
File "/usr/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.10/ssl.py", line 1303, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.10/ssl.py", line 1159, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/botocore/response.py", line 99, in read
chunk = self._raw_stream.read(amt)
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/urllib3/response.py", line 566, in read
with self._error_catcher():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/urllib3/response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ebs.us-east-1.amazonaws.com', port=443): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/dsnap/snapshot.py", line 127, in <lambda>
t = Thread(target=lambda: self._run(func))
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/dsnap/snapshot.py", line 148, in _run
raise e
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/dsnap/snapshot.py", line 139, in _run
f(block)
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/dsnap/snapshot.py", line 178, in download
b.fetch().write()
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/dsnap/snapshot.py", line 43, in write
data = self.BlockData.read()
File "/home/intense/dsnap/venv/lib/python3.10/site-packages/botocore/response.py", line 102, in read
raise ReadTimeoutError(endpoint_url=e.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "None"
^Cved block 5451 of 5454
Aborted!
Expected behavior
Detect stuck download threads of blocks, clean them up after a timeout and re-download the block
Desktop (please complete the following information):
- WSL2 environment
- Linux HOSTNAME 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Same happening here, its stuck at that block with many "Read timeout errors" before that one.
return self._sslobj.read(len, buffer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\Lib\site-packages\botocore\response.py", line 99, in read
chunk = self._raw_stream.read(amt)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\Lib\site-packages\urllib3\response.py", line 566, in read
with self._error_catcher():
File "C:\ProgramData\anaconda3\Lib\contextlib.py", line 155, in __exit__
self.gen.throw(typ, value, traceback)
File "C:\ProgramData\anaconda3\Lib\site-packages\urllib3\response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: AWSHTTPSConnectionPool(host='ebs.us-east-1.amazonaws.com', port=443): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "C:\ProgramData\anaconda3\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\anaconda3\Lib\site-packages\dsnap\snapshot.py", line 127, in <lambda>
t = Thread(target=lambda: self._run(func))
^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\Lib\site-packages\dsnap\snapshot.py", line 148, in _run
raise e
File "C:\ProgramData\anaconda3\Lib\site-packages\dsnap\snapshot.py", line 139, in _run
f(block)
File "C:\ProgramData\anaconda3\Lib\site-packages\dsnap\snapshot.py", line 178, in download
b.fetch().write()
File "C:\ProgramData\anaconda3\Lib\site-packages\dsnap\snapshot.py", line 43, in write
data = self.BlockData.read()
^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\anaconda3\Lib\site-packages\botocore\response.py", line 102, in read
raise ReadTimeoutError(endpoint_url=e.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "None"
Saved block 61402 of 61429```