aio-libs-abandoned/aioredis-py

(aioredis v2) "Bad file descriptor" / "Event loop is closed" when shutting down event loop

egnor opened this issue · 13 comments

egnor commented

I have a simple app which

  • launches a main with asyncio.run(main())
  • in that main, creates a client with redis = aioredis.from_url(...) (using SSL)
  • does some things with the client
  • shuts down the client with await redis.close()
  • returns from the main, at which point asyncio.run stops the event loop

This causes the following exception to be logged as WARNING (not thrown!) on exit:

⚠️   asyncio: Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7fa348eb8790>
transport: <_SelectorSocketTransport closing fd=6>
Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 918, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/sslproto.py", line 684, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 924, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 719, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 731, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 746, in call_soon
    self._check_closed()
  File "/usr/lib/python3.9/asyncio/base_events.py", line 510, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

Note that while you may consider the extra log a non-issue, Bad file descriptor errors are always a bit scary to me, since there's always the possibility that fd could have been reused elsewhere in the process, and then random data would be written to a random fd, causing arbitrary corruption and bad data.

As for the root cause, there is some complexity here and you will find a number of threads online about SSL connection shutdown and event loop shutdown. The consensus seems to be that, at present at least, all connections need to be fully closed (with wait_closed() awaited for) before the event loop is shut down. This is arguably a bug depending on who you ask, but it is the current state of asyncio in any case.

In aioredis, Connection.disconnect does in fact do that. However, Redis.close simply returns its active connection to its connection pool and does not disconnect it. The connection pool itself is left alone.

This makes sense because the connection pool may be shared with other Redis instances and other users. However, in this case, the connection pool was created in the Redis constructor, so there's no opportunity to disconnect it. If I add a direct call like this:

await redis.close()  # return connection to pool
await redis.connection_pool.disconnect()  # shut down all connections in pool

Then, the error doesn't happen. But it feels like the connection_pool is an internal detail that Redis should be managing, if it was created internally and not passed in?

Haven't taken much time to look around our source code for validation and it's the night so I'll let others give input. For now, my response: In Django, there is a call done for connections where they explicitly close connections for the developer. Whatever the service you're using may not have what Django does. Maybe you do need to explicitly call the disconnect yourself, but I'm not entirely sure: https://github.com/django/django/blob/f331eba6d576752dd79c4b37c41d981daa537fe6/django/core/cache/__init__.py#L59-L66

Hope that somewhat answers it unless I'm totally brain dead at the moment.

@egnor Can you post example code that we can use to reproduce?

egnor commented

@Andrew-Chen-Wang I'm not using a "service" at all, this is a simple command line tool.

@abrookins I'm working on making a standalone example now!

egnor commented

Here's an example that demos the issue. You'll need to set $REDIS_URL to a database of some kind; this will scan and dump all the keys and values, so probably don't use a big database, or modify the code to add a match expression, or something. Uncomment the line after the bug ref to show my workaround.

The redis instance I'm using is quite some distance away (on Heroku), it's possible the higher network latency is part of triggering this issue. If you can't repro it with your own test databases, I can give one of you credentials to my database instance (it's not all that sensitive, just a hobby project).

[note: edited to remove some value-printing code that would fail on an empty database]

#!/usr/bin/env python3

import asyncio
import os
import sys

import aioredis


async def main() -> None:
    redis_url = os.environ.get("REDIS_TLS_URL") or os.environ["REDIS_URL"]
    redis = aioredis.from_url(redis_url, ssl_cert_reqs=None)

    cursor = None
    while cursor != 0:
        cursor, keys = await redis.scan(cursor=cursor or 0)
        for key in keys:
            print(f'key: {key.decode()}')

    await redis.close()

    # https://github.com/aio-libs/aioredis-py/issues/1078
    # await redis.connection_pool.disconnect()


sys.exit(asyncio.run(main()))

@egnor I tried running your example with local Redis running (fyi had to remove ssl_cert_reqs=None because I wasn't using SSL with your Heroku instance). First thing that happened was keys returned an empty list which resulted in redis.mget raising a TypeError saying it required one positional argument (since keys was empty, there were no value to pass). I did check that cursor was an int 0.

Maybe wrap your for loop zip with a try except TypeError.

However your original comment:

SSL connection shutdown and event loop shutdown. The consensus seems to be that, at present at least, all connections need to be fully closed (with wait_closed() awaited for)

is something that may need looking into here.

egnor commented

Ah, empty database I guess? Anyway I removed the value-fetching part entirely (editing the post above), that's not important.

Also, I suspect this requires an SSL instance to repro, since it's the SSL stack in particular that writes data as part of a closing handshake.

Interesting issue. I'm not able to reproduce with an SSL connection to a not-super-close Redis instance, myself, running your example code on Python 3.9.5. What version of Python do you have?

If you want to send me credentials, you can email me at andrew.brookins@redislabs.com, or you can share them with me via 1Password or really any way you choose!

@egnor Also, what version of Python are you using?

egnor commented
% python --version
Python 3.9.5

I just verified it still repro's. Will send credentials. Thanks for looking into it!

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
可以替代asyncio.run(main())
解决Event loop is closed

@egnor please let me know if this is still happening on master branch. if so, do you mind telling me the exact number of seconds this error takes to occur from start of main to end?

egnor commented

Sorry for going AWOL! This project ended and I haven't worked with the codebase in a while. So unless you have other people with this problem, we can just close this unless it reappears somewhere, or someone else has this problem?

@LWS-CSDN I am using Google Translate to understand you, but I think you're suggesting to replace asyncio.run() with loop.run_until_complete() to avoid this issue. I think that's mostly brushing it under the rug by avoiding destruction of the event loop...? But does that mean you encountered this issue?

@egnor thanks for the speedy reply! I'm tempted to close this, but redis-py is introducing further SSL tests, so I'll check to see if this issue could be potentially resolved via their testing and any introduction of new code.

Works great for in my case! Thank YOU! @LWS-CSDN

Fix log

Exception

[2022-07-07 11:11:11,111: ERROR/ForkPoolWorker-2] Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x123>
transport: <_SelectorSocketTransport closing fd=135>
Traceback (most recent call last):
File "/usr/lib/python3.9/asyncio/selector_events.py", line 918, in write
n = self._sock.send(data)
File "/home/user/project/venv/lib/python3.9/site-packages/gevent/_socketcommon.py", line 722, in send
return self._sock.send(data, flags)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.9/asyncio/sslproto.py", line 684, in _process_write_backlog
self._transport.write(chunk)
File "/usr/lib/python3.9/asyncio/selector_events.py", line 924, in write
self._fatal_error(exc, 'Fatal write error on socket transport')
File "/usr/lib/python3.9/asyncio/selector_events.py", line 719, in _fatal_error
self._force_close(exc)
File "/usr/lib/python3.9/asyncio/selector_events.py", line 731, in _force_close
self._loop.call_soon(self._call_connection_lost, exc)
File "/usr/lib/python3.9/asyncio/base_events.py", line 746, in call_soon
self._check_closed()
File "/usr/lib/python3.9/asyncio/base_events.py", line 510, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

Fix

Use

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

instead of

asyncio.run(main())

some tags for search engines:

aiogram, asyncio, celery, django