ploxiln/paramiko-ng

Deadlock in paramiko-ng 2.8.10

Opened this issue · 6 comments

Ran into a deadlock issue. We recently switched from paramiko to paramiko-ng. Not sure if the same problem existed before. There were a large number of threads (about 125) using paramiko at the same time.

  • The application is a pytest based testing framework
  • It calls paramiko.util.log_to_file() to send log output to a file

Almost all the threads using paramiko are stuck at the same place in python logging module as shown below.

Threads stuck here - an example:
  File "/usr/local/lib/python3.7/dist-packages/paramiko/packet.py", line 307, in read_all
    self._check_keepalive()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/packet.py", line 566, in _check_keepalive
    self.__keepalive_callback()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/transport.py", line 1043, in _request
    return x.global_request('keepalive@lag.net', wait=False)
  File "/usr/local/lib/python3.7/dist-packages/paramiko/transport.py", line 1071, in global_request
    self._log(DEBUG, 'Sending global request "%r"', kind)
  File "/usr/local/lib/python3.7/dist-packages/paramiko/transport.py", line 1697, in _log
    self.logger.log(level, msg, *args, **kwargs)
  File "/usr/lib/python3.7/logging/__init__.py", line 1449, in log
    self._log(level, msg, args, **kwargs)
  File "/usr/lib/python3.7/logging/__init__.py", line 1519, in _log
    self.handle(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 1529, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 1591, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 903, in handle
    self.acquire()
  File "/usr/lib/python3.7/logging/__init__.py", line 854, in acquire
    self.lock.acquire()

Looks like this thread is the culprit.
It first acquired the logging module's lock and got stuck acquiring paramiko channel's lock.
Some other thread (that's stuck in the logging module) must have acquired the paramiko lock this thread is stuck on,
leading to a deadlock.

The self.stream.flush() below is supposed to be flushing the stream handler set up by paramiko.util.log_to_file(),
but somehow it ends up in  paramiko channel deletion code.

2022-12-10 12:55:00 (UTC) | 7009 |    DEBUG :         | ThreadPoolExecutor-2_0 | Stacktrace of <paramiko.Transport at 0x1401f898 (cipher aes128-ctr, 128 bits) (active; 2 open channel(s))>:
  File "/usr/lib/python3.7/threading.py", line 885, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/transport.py", line 1950, in run
    self._channel_handler_table[ptype](chan, m)
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 1174, in _handle_close
    msgs = self._close_internal()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 1247, in _close_internal
    m1 = self._eof_internal()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 1240, in _eof_internal
    self._log(DEBUG, 'EOF sent ({})'.format(self._name))
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 1204, in _log
    self.logger.log(level, "[chan " + self._name + "] " + msg, *args)
  File "/usr/lib/python3.7/logging/__init__.py", line 1449, in log
    self._log(level, msg, args, **kwargs)
  File "/usr/lib/python3.7/logging/__init__.py", line 1519, in _log
    self.handle(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 1529, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 1591, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 905, in handle
    self.emit(record)
  File "/usr/lib/python3.7/logging/__init__.py", line 1038, in emit
    self.flush()
  File "/usr/lib/python3.7/logging/__init__.py", line 1018, in flush
    self.stream.flush()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/file.py", line 66, in __del__
    self.close()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 1404, in close
    self.channel.shutdown_write()
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 980, in shutdown_write
    self.shutdown(1)
  File "/usr/local/lib/python3.7/dist-packages/paramiko/channel.py", line 949, in shutdown
    self.lock.acquire()

This is not a deadlock between 2 threads. The transport thread tried to acquire a channel lock a second time, while holding the logging lock, blocking itself & all other threads trying to log.

Very interesting ... __del__ is the destructor which is called automatically before the object (BufferedFile) is freed, possibly because the reference count decremented to zero. Was a logging argument the very last reference to the Channel's BufferedFile? The Channel was already in the process of closing ...

log_to_file() was called by the app to set up paramiko logging to a file.
So paramiko logging should always be writing to that file.
It should not be writing to or deleting a channel's BufferedFile.
How did the logger instance used by paramiko become associated with a channel's BufferedFile?

I don't think the logger instance was explicitly associated with the channel BufferedFile. It still seems to me like the python GC decided to "free" the BufferedFile at that point in the logging code, causing its destructor method to be called. It has been said to be inadvisable to put any significant logic in destructors, and paramiko channel locking has caused tricky issues before.

Anyway, I think the thing to do, is to make the channel close logic detect when it's already closing, and skip it before trying to take that channel lock, in some reasonably elegant way ...

This problem happens with moderate frequency in our test environment. Every time the stacktrace is the same - happens when handling MSG_CHANNEL_CLOSE.

@ploxiln Are you planning on fixing this issue?
This other issue also seems to be related - during object destruction:
#148

A more general question: There hasn't been a new release since Jan 2022. Would it be fair to conclude that paramiko-ng will not be actively managed going forward?