noirello/bonsai

Why connection.close() can raise an exception?

dolfinus opened this issue · 13 comments

I'm using AIOConnectionPool and try to check its behavior if LDAP server become unavailable.

Initial setup - I've started LDAP server on localhost, and try to search for a specific user:

Run local LDAP server

Using https://github.com/docker-ThoTeam/slapd-server-mock:

docker run --rm -v ./data.ldif.TEMPLATE:/bootstrap/data.ldif.TEMPLATE -p 389:389 thoteam/slapd-server-mock
>>> import logging
>>> from bonsai import LDAPClient, LDAPError, LDAPSearchScope
>>> from bonsai.asyncio import AIOConnectionPool

>>> log = logging.getLogger()

>>> client = LDAPClient("ldap://localhost")
>>> pool = AIOConnectionPool(client)

>>> async with pool.spawn() as connection:
>>>    results = await connection.search(base="ou=people,dc=ldapmock,dc=local", filter_exp="uid=developer1", scope=LDAPSearchScope.SUBTREE)

>>> results
[{'dn': <LDAPDN uid=developer1,ou=people,dc=ldapmock,dc=local>, 'objectClass': ['inetOrgPerson', 'organizationalPerson', 'person', 'top'], 'cn': ['developer One'], 'sn': ['One'], 'givenName': ['developer'], 'mail': ['developer.one@ldapmock.local'], 'uid': ['developer1']}]

>>> connection
<bonsai.asyncio.aioconnection.AIOLDAPConnection object at 0x7f922fbd3770>
>>> connection.closed
False

Pool now has an opened connection. Stop LDAP server, and try to search for user again

try:
    async with pool.spawn() as connection:
        try:
            results = await connection.search(base="ou=people,dc=ldapmock,dc=local", filter_exp="uid=developer1", scope=LDAPSearchScope.SUBTREE)
        except LDAPError:
            log.exception("Exception during search")
            connection.close()
except LDAPError:
    log.error("Exception during spawn")
    raise

Calling connection.close() allows to skip returning connection back to pool, so next access to the same pool will initialize a new connection: https://github.com/noirello/bonsai/blob/74ce8f8548a04ab763a21ffaf4bef3ef529ced18/docs/advanced.rst#connection-timeouts

But connection.close() raises an exception, so connection is not being closed:

Exception during search
Traceback (most recent call last):
  File "<console>", line 4, in <module>
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 59, in _poll
    raise exc
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 54, in _poll
    return await asyncio.wait_for(fut, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 40, in _ready
    res = super().get_result(msg_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])
Exception during spawn
Traceback (most recent call last):
  File "<console>", line 4, in <module>
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 59, in _poll
    raise exc
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 54, in _poll
    return await asyncio.wait_for(fut, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 40, in _ready
    res = super().get_result(msg_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "<console>", line 7, in <module>
bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])

>>> connection
<bonsai.asyncio.aioconnection.AIOLDAPConnection object at 0x7f922fbd3770>
>>> connection.closed
False

Because connection.close() raised an exception, it looks like it haven't changed connection.closed flag to True. This connection is broken now, it cannot be used for making any requests.

For example, start LDAP server again and use the same pool. Because connection.closed is not True, it is still in the pool and being reused:

try:
    async with pool.spawn() as connection:
        try:
            results = await connection.search(base="ou=people,dc=ldapmock,dc=local", filter_exp="uid=developer1", scope=LDAPSearchScope.SUBTREE)
        except LDAPError:
            log.exception("Exception during search")
            connection.close()
except LDAPError:
    log.error("Exception during spawn")
    raise

Exception during search
Traceback (most recent call last):
  File "<console>", line 4, in <module>
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 59, in _poll
    raise exc
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 54, in _poll
    return await asyncio.wait_for(fut, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/maxim/Repo/horizon/.venv/lib/python3.11/site-packages/bonsai/asyncio/aioconnection.py", line 40, in _ready
    res = super().get_result(msg_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
bonsai.errors.LDAPError: Success. (0x0000 [0])

Well, something is definitely went wrong.
Fortunately, connection now become closed, so next requests made using the same pool working as expected:

>>> connection
<bonsai.asyncio.aioconnection.AIOLDAPConnection object at 0x7f922fbd3770>
>>> connection.closed
True

>>> try:
...     async with pool.spawn() as connection:
...         try:
...             results = await connection.search(base="ou=people,dc=ldapmock,dc=local", filter_exp="uid=developer1", scope=LDAPSearchScope.SUBTREE)
...         except LDAPError:
..             log.exception("Exception during search")
...             connection.close()
... except LDAPError:
...     log.error("Exception during spawn")
...     raise

>>> results
[{'dn': <LDAPDN uid=developer1,ou=people,dc=ldapmock,dc=local>, 'objectClass': ['inetOrgPerson', 'organizationalPerson', 'person', 'top'], 'cn': ['developer One'], 'sn': ['One'], 'givenName': ['developer'], 'mail': ['developer.one@ldapmock.local'], 'uid': ['developer1']}]

>>> connection
<bonsai.asyncio.aioconnection.AIOLDAPConnection object at 0x7f922effc670>
>>> connection.closed
False

Few more things bother me:

  1. connection.close() is synchronous in both LDAPConnection and AIOLDAPConnection, but it looks like it try to send network request to LDAP server for properly unbind/disconnect instead of just dropping connection. Current implementation of AIOConnectionPool.close() is asynchronous, and it calls this synchronous method, so it can block event loop:

    async def close(self) -> None:

    def close(self) -> None:

  2. Current Pool.close() implementation calls conn.close() for all connections using a cycle. So if some connection raises an exception during close, other connections in the pool may left unclosed. The only workaround I could find is:

for _ in range(pool.shared_connections):
    with contextlib.suppress(LDAPError):
        pool.close()

Which does not look right.

Proposed solution:

  1. Change connection.close() implementation to never raise any exceptions, if that even possible.
  2. If AIOLDAPConnection.close() requires some request to be made to LDAP server (unbind, explicit connection close), make it async, and use await in AIOConnectionPool.close()
  3. If establishing the connection is failed with any LDAPError (not just ConnectionError, but also AuthenticationError, InvalidDN, TimeoutError and so on), immediately mark connection as closed without requiring to call connection.close() explicitly. This connection cannot be used anymore.
  4. Change ConnectionPool.close() implementation to pop connection from set, e.g.:
def close(self) -> None:
    while self._idles:
        with suppress(Exception):
            conn = self._idles.pop()
            conn.close()
    while self._used:
        with suppress(Exception):
            conn = self._used.pop()
            conn.close()
    self._closed = True

To avoid partial cleanup of the pool if some connection cannot be closed.

Yes, the close method does more, than it probably should. Before the unbind, it sends out abandon requests for the unfinished operations. These are kind of a fire-and-forget requests, that's why the close method isn't awaitable. There aren't any message ids that can be polled.

One idea is to add an optional parameter to the close method that'd skip sending abandon requests.

If fire-and-forget means that event loop is not blocked, it's fine to leave this method synchronous.

But it should properly close connection in any case, without leaving it half-closed if such request caused an exception. Adding optional parameter is just sweeping the problem under the rug.

So?

This is the idea, that I was talking about: dev...optional-abandon

This way the close method wouldn't do extra abandon operations by default, and it has fewer chances to raise error.

Why not just remove this line?

set_exception(self->ld, rc);

And replace it with error/debug message

It seems a too drastic measure for me at that level, that could easily lead silent fails (simple messages can be overlooked).

I've changed the ConnectionPool.close() almost as you suggested in your proposed solutions.

It seems a too drastic measure for me at that level, that could easily lead silent fails

Em, connection is getting closed, it can't be used anymore in any case.

I've checked that you can call close on a disconnected/stopped LDAP server without errors.

Also looked through the OpenLDAP source code, and it seems to me that ldap_unbind_ext always returns LDAP_SUCCESS. (There's some client control freeing part that I didn't check, but doesn't matter here, there's no control set).

But I still think it's good to keep the exception setting in place. The unbind implementation could change anytime, and the function signature suggests that it could return with an error.

IMHO the problem here in this case can be boiled down to two things:

  • The LDAPConnection.close method originally did more, than it was supposed to do by default: calling abandon on on-going requests. This is what's likely caused the raised LDAPError exception, and this is addressed on the branch.
  • There's no proper way currently to identify broken connection in the pool. Calling close on connection is a suboptimal solution that's come up here, when this issue was first raised.

OK, I've tried to reproduce the error on this branch, using issue description, and it looks like connection is closed properly:

Open connection pool, then stop LDAP server and call:

>>> try:
...     async with pool.spawn() as connection:
...         try:
...             results = await connection.search(base="ou=people,dc=ldapmock,dc=local", filter_exp="uid=developer1", scope=LDAPSearchScope.SUBTREE)
...         except LDAPError:
...             log.exception("Exception during search")
...             connection.close()
... except LDAPError:
...     log.error("Exception during spawn")
...     raise
... 
Exception during search
Traceback (most recent call last):
  File "<console>", line 4, in <module>
  File "/home/maxim/Repo/bonsai/venv/lib/python3.12/site-packages/bonsai/asyncio/aioconnection.py", line 59, in _poll
    raise exc
  File "/home/maxim/Repo/bonsai/venv/lib/python3.12/site-packages/bonsai/asyncio/aioconnection.py", line 54, in _poll
    return await asyncio.wait_for(fut, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maxim/.pyenv/versions/3.12.0/lib/python3.12/asyncio/tasks.py", line 510, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/maxim/Repo/bonsai/venv/lib/python3.12/site-packages/bonsai/asyncio/aioconnection.py", line 40, in _ready
    res = super().get_result(msg_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
bonsai.errors.ConnectionError: Can't contact LDAP server. (0xFFFF [-1])
>>> connection.closed
True

Thank you!

The LDAPConnection.close method originally did more, than it was supposed to do by default: calling abandon on on-going requests. This is what's likely caused the raised LDAPError exception, and this is addressed on the branch.

One more thing here - according to https://linux.die.net/man/3/ldap_unbind_ext (Unbinding section):
The ldap_unbind_s() call is just another name for ldap_unbind(); both of these calls are synchronous in nature.

Is it syncronous in terms of asyncio as well? I mean does this line block the event loop, and prevents other coroutines to be started while calling await connection.close()?

async with self._lock:
super().close()

There's no proper way currently to identify broken connection in the pool.

If some ldap-related function call returned LDAP_SERVER_DOWN because LDAP server is not available, and then LDAP server is ready again, is it possible to use the same opened connection to the same LDAP server? Does returning LDAP_SERVER_DOWN leads to self->socketpair or ssock in state which cannot be used anymore?

One more thing here - according to https://linux.die.net/man/3/ldap_unbind_ext (Unbinding section):
The ldap_unbind_s() call is just another name for ldap_unbind(); both of these calls are synchronous in nature.

The ldap_unbind_ext does call a write system command before closing the socket. So yes, if there's not enough buffer in the kernel space that method could block. Setting bonsai.set_connect_async(True) might help, but that could lead other problems when TLS is used. The async part of the libldap library's interface works well with operations (add, delete, search etc.), but when it comes to connection handling, it isn't exactly the best.

If some ldap-related function call returned LDAP_SERVER_DOWN because LDAP server is not available, and then LDAP server is ready again, is it possible to use the same opened connection to the same LDAP server

It seems it doesn't possible to reuse the connection.

Ok, what's next? What do you think about merging optional-abandon branch to dev/master?

I merged the branch to dev.

Thanks!