Handle Panda disconnect exceptions more elegantly
Opened this issue · 0 comments
evalott100 commented
During a late night run of I22 on @gilesknap 's container the following error was repeated many times:
ERROR:PandA did not respond to GetChanges within 1.0 seconds. Setting all records to major alarm state.
callbackRequest: ERROR cbLow ring buffer full
callbackRequest: ERROR cbLow ring buffer full
WARNING:socket.send() raised exception.
ERROR:Task exception was never retrieved
future: <Task finished name='Task-68034730' coro=<StreamWriter.drain() done, defined at /usr/lib/python3.10/asyncio/streams.py:348> exception=BrokenPipeError(32, 'Broken pipe')>
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/streams.py", line 359, in drain
raise exc
File "/usr/lib/python3.10/asyncio/streams.py", line 359, in drain
raise exc
File "/usr/lib/python3.10/asyncio/streams.py", line 359, in drain
raise exc
[Previous line repeated 33623 more times]
File "/venv/lib/python3.10/site-packages/pandablocks/asyncio.py", line 103, in _ctrl_read_forever
received = await reader.read(4096)
File "/usr/lib/python3.10/asyncio/streams.py", line 650, in read
raise self._exception
File "/usr/lib/python3.10/asyncio/streams.py", line 359, in drain
raise exc
File "/usr/lib/python3.10/asyncio/streams.py", line 359, in drain
raise exc
File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
n = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
We should improve this section of code:
PandABlocks-client/src/pandablocks/asyncio.py
Lines 110 to 111 in 3638f0d
except BrokenPipeError as e:
logging.exception(f"Error handling '{received.decode()}'")
await asyncio.sleep(<wait more time before trying again>)
...
# Except other errors the panda should be able to handle
...
except Exception as e:
raise e
@coretl Thoughts?
Update
We agreed in a meeting that it's probably a good idea to completely shut down the pandablocks-ioc on such a failure, then let the kubernetes liveness.sh
handle restarting it.