Socket leak after TimeoutError
cellscape opened this issue · 6 comments
Long story short
In very rare cases with heavily overloaded remote server aiohttp client leaks file descriptors for sockets on timeout.
Expected behaviour
No leaked file descriptors on timeout.
Actual behaviour
aiohttp client leaks FDs from timed out connections.
Steps to reproduce
I don't have (yet) a way to reproduce the situation yet but I have some backtrace and code that generates it. It's from long-running HTTP client daemon which periodically fetches some resources.
try:
connector = aiohttp.connector.TCPConnector(
resolver=Resolver(host, ip), verify_ssl=self.verify_ssl)
with aiohttp.ClientSession(connector=connector) as client:
r = yield from client.request(self.method, url, headers=headers)
...
except Exception as e:
self.logger.info('failed to fetch', exc_info=e)
info 2017-02-22 23:30:20,927 : INFO : failed to fetch
Traceback (most recent call last):
File "/opt/fetcher/lib/python3.4/site-packages/fetcher/base.py", line 126, in _do_fetch
r = yield from client.request(self.method, url, headers=headers)
File "/opt/fetcher/lib/python3.4/site-packages/aiohttp/client.py", line 577, in __iter__
resp = yield from self._coro
File "/opt/fetcher/lib/python3.4/site-packages/aiohttp/client.py", line 274, in _request
break
File "/opt/fetcher/lib/python3.4/site-packages/aiohttp/helpers.py", line 765, in __exit__
raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError
The issue happens very rarely when remote server is heavily overloaded and client sees all kinds of failures, including connections refused, resets and timeouts. There are no leaked FDs with other exceptions, only with this one.
Connections happen over HTTPS, remote server is Apache/2.4.25 (Amazon) OpenSSL/1.0.1k-fips PHP/5.6.29. Initially I noticed leaks with aiohttp 1.1.1 and thought it was caused by #1568 but upgrading to 1.3.3 didn't help.
Currently I'm running it with PYTHONASYNCIODEBUG=1 but so far no such timeouts yet. Any ideas how to debug that?
Your environment
aiohttp 1.3.3, Debian Jessie, Python 3.4
If it's bug in python 3.5 and up, why am I affected with 3.4? Or do I misunderstand something?
Upgrading to python 3.5 wouldn't be easy because there are no official 3.5 packages for jessie and issue happens only on production, don't want to mess with packages there. I'll try to at least replicate the issue with 3.5 on test system.
Maybe I could just try to apply patch from https://bugs.python.org/issue29406 to asyncio in 3.4? Would this make sense?
that patch is for 3.5. let me review timeout related code
I think this is bug in asyncio._SelectorSslTransport implementation.
if ssl handshake takes more time than timeout, aiohttp cancels loop.create_connection()
call, tis cancels waiter future, but ssl transport handshake implementation does not do any checks for closing state and continues handshake.
I think new ssl transport implementation from py3.5 suffers from the same problem.
technically, it is not related TimeoutError. it is problem with create_connection() cancelation during handshake process.
this will be fixed in next python bugfix version