aio-libs/aiohttp

Socket leak after TimeoutError

cellscape opened this issue · 6 comments

Long story short

In very rare cases with heavily overloaded remote server aiohttp client leaks file descriptors for sockets on timeout.

Expected behaviour

No leaked file descriptors on timeout.

Actual behaviour

aiohttp client leaks FDs from timed out connections.

Steps to reproduce

I don't have (yet) a way to reproduce the situation yet but I have some backtrace and code that generates it. It's from long-running HTTP client daemon which periodically fetches some resources.

try:
    connector = aiohttp.connector.TCPConnector(
        resolver=Resolver(host, ip), verify_ssl=self.verify_ssl)
    with aiohttp.ClientSession(connector=connector) as client:
        r = yield from client.request(self.method, url, headers=headers)
        ...
except Exception as e:
    self.logger.info('failed to fetch', exc_info=e)
info 2017-02-22 23:30:20,927 : INFO : failed to fetch
Traceback (most recent call last):
File "/opt/fetcher/lib/python3.4/site-packages/fetcher/base.py", line 126, in _do_fetch
  r = yield from client.request(self.method, url, headers=headers)
File "/opt/fetcher/lib/python3.4/site-packages/aiohttp/client.py", line 577, in __iter__
  resp = yield from self._coro
File "/opt/fetcher/lib/python3.4/site-packages/aiohttp/client.py", line 274, in _request
  break
File "/opt/fetcher/lib/python3.4/site-packages/aiohttp/helpers.py", line 765, in __exit__
  raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

The issue happens very rarely when remote server is heavily overloaded and client sees all kinds of failures, including connections refused, resets and timeouts. There are no leaked FDs with other exceptions, only with this one.

Connections happen over HTTPS, remote server is Apache/2.4.25 (Amazon) OpenSSL/1.0.1k-fips PHP/5.6.29. Initially I noticed leaks with aiohttp 1.1.1 and thought it was caused by #1568 but upgrading to 1.3.3 didn't help.

Currently I'm running it with PYTHONASYNCIODEBUG=1 but so far no such timeouts yet. Any ideas how to debug that?

Your environment

aiohttp 1.3.3, Debian Jessie, Python 3.4

#1568 is bug in python3.5 and up, could you upgrade python?

If it's bug in python 3.5 and up, why am I affected with 3.4? Or do I misunderstand something?

Upgrading to python 3.5 wouldn't be easy because there are no official 3.5 packages for jessie and issue happens only on production, don't want to mess with packages there. I'll try to at least replicate the issue with 3.5 on test system.

Maybe I could just try to apply patch from https://bugs.python.org/issue29406 to asyncio in 3.4? Would this make sense?

that patch is for 3.5. let me review timeout related code

I think this is bug in asyncio._SelectorSslTransport implementation.

if ssl handshake takes more time than timeout, aiohttp cancels loop.create_connection()
call, tis cancels waiter future, but ssl transport handshake implementation does not do any checks for closing state and continues handshake.

I think new ssl transport implementation from py3.5 suffers from the same problem.

technically, it is not related TimeoutError. it is problem with create_connection() cancelation during handshake process.

@asvetlov @1st1

this will be fixed in next python bugfix version

lock commented

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.