aio-libs/aiohttp

aiohttp socket leak and incorrect connection close

thehesiod opened this issue ยท 16 comments

Long story short

Issue 1

it appears for certain servers the ssl sockets are not released correctly.

Issue 2:

I believe aiohttp.client._RequestContextManager.__aexit__ () should not close() the connector on exception as this should be up to the client of aiohttp. Instead it should always opportunistically release the socket. Incidentally changing this to release also fixes #1 in my scenario.

Expected behaviour

Issue 1

With the sample script the number of open handles should stay relatively stable.

Issue 2

raising a custom exception from an aiohttp request contest should not cause the connector to be closed and a new one opened.

Actual behaviour

Issue 1

number of context handles grows very quickly with test script. Is resolved by changing https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/connector.py#L392 to be abort(), or changing https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client.py#L555 to always release. However there appears to be an underlying issue with closing SSL sockets with certain servers.

Issue 2

tcpconnector is closed and re-opened when exception is raised from response context.

Steps to reproduce

Run sample script:

#!/usr/bin/env python3
import asyncio
import psutil
from aiohttp import web, ClientSession, TCPConnector
import ssl
from pympler.tracker import SummaryTracker

PORT = 8888

url = 'https://madis-data.ncep.noaa.gov/madisPublic1/data/archive/2001/07/01/LDAD/mesonet/netCDF/20010701_0800.gz'
# url = 'https://localhost:{}/'.format(PORT)
#url = 'https://localhost/20010701_0800.gz'

async def hello(request):
    return web.Response(status=304, content_type='application/octet-stream')


async def track_mem():
    tracker = SummaryTracker()
    proc = psutil.Process()
    while True:
        print("Number of open files: {} conns: {}".format(proc.num_fds(), len(proc.connections())))
        # tracker.print_diff()
        await asyncio.sleep(5)


async def server_startup(app):
    ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
    ssl_context.set_default_verify_paths()
    ssl_context.load_cert_chain('/tmp/cert', '/tmp/key')

    connector = TCPConnector(limit=1, conn_timeout=10, ssl_context=ssl_context)
    session = ClientSession(connector=connector)

    # start 10 workers
    for i in range(10):
        asyncio.ensure_future(worker(session))

    # asyncio.ensure_future(track_mem())


async def worker(session: ClientSession):
    loop = asyncio.get_event_loop()
    loop.set_debug(True)

    while True:
        try:
            headers = {'If-Modified-Since': 'Sun, 01 Jul 2001 10:24:33 GMT'}
            async with session.get(url, headers=headers) as response:
                print('.', end='', flush=True)
                raise Exception
        except:
            pass


def main():
    app = web.Application()
    app.router.add_get('/', hello)
    app.on_startup.append(server_startup)

    ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
    ssl_context.set_default_verify_paths()
    ssl_context.load_cert_chain('/tmp/cert', '/tmp/key')

    web.run_app(app, port=PORT, ssl_context=ssl_context)


if __name__ == '__main__':
    main()

keep track of open files:

watch -n0.5 "lsof | wc -l"

and note how the number of handles grows very quickly. It should remain stable. I was unable to reproduce this with the sample server included so it must be some kind of special behavior of the server this testcase hits.

Your environment

OSX and linux, python3.5.3 with latest aiohttp

per the headers returned it's an apache server, I even tried reproducing with the OSX built-in apache server and it doesn't reproduce, so it seems like it's something particular to this server.

this is in fact related to SSL as the http endpoint ends up redirecting to the https endpoint :)

ends up being a asyncio issue! Here's the testcase:

import asyncio
import ssl


if True:  # madis
    HOST = 'madis-data.ncep.noaa.gov'
    PORT = 443
    PATH = '/madisPublic1/data/archive/2001/07/01/LDAD/mesonet/netCDF/20010701_0800.gz'
else:  # localhost
    HOST = 'localhost'
    PORT = 443
    PATH = '/20010701_0800.gz'


MSG = 'GET {path} HTTP/1.1\r\nIf-Modified-Since: Sun, 01 Jul 2001 10:24:33 GMT\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nHost: madis-data.ncep.noaa.gov\r\nUser-Agent: Python/3.5 aiohttp/1.3.0a0\r\nContent-Length: 0\r\n\r\n'.format(path=PATH)


class HttpClient(asyncio.Protocol):
    def connection_made(self, transport):
        self.transport = transport
        self.transport.write(MSG.encode('ascii'))
        print('data sent:', MSG)

    def data_received(self, data):
        print('data received:', data)

        asyncio.get_event_loop().call_later(1.0, self.transport.close)

    def eof_received(self):
        pass

    def connection_lost(self, exc):
        print('connection lost:', exc)
        asyncio.get_event_loop().stop()


async def main():
    loop = asyncio.get_event_loop()


    ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
    ssl_context.set_default_verify_paths()
    ssl_context.load_cert_chain('/tmp/cert', '/tmp/key')

    conn = await loop.create_connection(HttpClient, HOST, PORT, ssl=ssl_context)


if __name__ == '__main__':
    _loop = asyncio.get_event_loop()
    _loop.run_until_complete(main())
    _loop.run_forever()

will write up a bug a backref here

created PR for #2: #1571

Could you post response data that you receive in 'data_received'

madis:

data received: b'HTTP/1.1 304 Not Modified\r\nDate: Wed, 01 Feb 2017 01:15:15 GMT\r\nServer: Apache\r\nETag: "19c5ce-31166-38805b7fa9a40"\r\n\r\n'

my server:

data received: b'HTTP/1.1 304 Not Modified\r\nDate: Wed, 01 Feb 2017 01:15:51 GMT\r\nServer: Apache/2.4.23 (Unix) LibreSSL/2.2.7 PHP/5.6.23\r\nETag: "31166-5475d6f656580"\r\n\r\n'

issue 1 should be fixed in master
i submitted patch for issue 2, i am not sure how much time it will take

@thehesiod i added special code for closed ssl transport, in addition to close aiohttp aborts all ssl transport after 2 seconds. this should solve leaked sockets

@fafhrd91 thank you sir!

could you test if it works?

will test asap.

looks like the PR broke master:

Traceback (most recent call last):                                                                                                  โ”‚                                                                                                                                   
  File "/usr/local/lib/python3.5/asyncio/events.py", line 126, in _run                                                              โ”‚658                                                                                                                                
    self._callback(*self._args)                                                                                                     โ”‚                                                                                                                                   
  File "/usr/local/lib/python3.5/site-packages/aiohttp/connector.py", line 254, in _cleanup_closed                                  โ”‚                                                                                                                                   
    self._cleanup_closed_handle.close()
AttributeError: 'TimerHandle' object has no attribute 'close'

should be fixed

MUCH better, yay, thank you!

Just mentioning for others that might be mislead here: they circled back later and disabled this fix by default (see 02b0951).