Load flakiness retry logic mismatches paramiko-ng
timsnyder-siv opened this issue · 1 comments
Lines 496 to 506 in eb89992
specifically the msg == 'Error reading SSH protocol banner'
seems to be too strictly checking the message.
I had a long-running @parallel
fabric thing crash with this stacktrace:
Exception: Error reading SSH protocol banner
Traceback (most recent call last):
File ".conda-env/lib/python3.9/site-packages/paramiko/transport.py", line 2049, in _check_banner
buf = self.packetizer.readline(timeout)
File ".conda-env/lib/python3.9/site-packages/paramiko/packet.py", line 360, in readline
buf += self._read_timeout(timeout)
File "conda-env/lib/python3.9/site-packages/paramiko/packet.py", line 575, in _read_timeout
raise EOFError()
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".conda-env/lib/python3.9/site-packages/paramiko/transport.py", line 1904, in run
self._check_banner()
File ".conda-env/lib/python3.9/site-packages/paramiko/transport.py", line 2053, in _check_banner
raise SSHException(
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
Exception: Error reading SSH protocol banner[Errno 104] Connection reset by peer
Traceback (most recent call last):
File ".conda-env/lib/python3.9/site-packages/paramiko/transport.py", line 2049, in _check_banner
buf = self.packetizer.readline(timeout)
File ".conda-env/lib/python3.9/site-packages/paramiko/packet.py", line 360, in readline
buf += self._read_timeout(timeout)
File ".conda-env/lib/python3.9/site-packages/paramiko/packet.py", line 573, in _read_timeout
x = self.__socket.recv(128)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/centos/src/project_data/federation_pit-1175/firesim/.conda-env/lib/python3.9/site-packages/paramiko/transport.py", line 1904, in run
self._check_banner()
File "/home/centos/src/project_data/federation_pit-1175/firesim/.conda-env/lib/python3.9/site-packages/paramiko/transport.py", line 2053, in _check_banner
raise SSHException(
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer
Fatal error: Needed to prompt for a connection or sudo password (host: 10.2.0.5), but input would be ambiguous in parallel mode
Aborting.
I have env.connection_attempts = 10
and I only see three nested exceptions. I'm also using key-based auth. The last one:
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer
I'm wondering if the SSHException message is ending up with more stuff in it and we changed the referenced code to be 'Error reading SSH protocol banner' in msg
it would correctly retry in this case. @ploxiln would you consider this a 'bug fix' or am I reaching too far?
Looking at the paramiko-ng code in question https://github.com/ploxiln/paramiko-ng/blob/b2322db80f55c9b07e518e555ece6284d4577cf0/paramiko/transport.py#L2039-L2041 it does seem like it stringifies the underlying exception and the message will only start with 'Error reading SSH protocol banner'