aiohttp socket leak and incorrect connection close
thehesiod opened this issue ยท 16 comments
Long story short
Issue 1
it appears for certain servers the ssl sockets are not released correctly.
Issue 2:
I believe aiohttp.client._RequestContextManager.__aexit__ ()
should not close()
the connector on exception as this should be up to the client of aiohttp. Instead it should always opportunistically release the socket. Incidentally changing this to release also fixes #1 in my scenario.
Expected behaviour
Issue 1
With the sample script the number of open handles should stay relatively stable.
Issue 2
raising a custom exception from an aiohttp request contest should not cause the connector to be closed and a new one opened.
Actual behaviour
Issue 1
number of context handles grows very quickly with test script. Is resolved by changing https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/connector.py#L392 to be abort(), or changing https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client.py#L555 to always release. However there appears to be an underlying issue with closing SSL sockets with certain servers.
Issue 2
tcpconnector is closed and re-opened when exception is raised from response context.
Steps to reproduce
Run sample script:
#!/usr/bin/env python3
import asyncio
import psutil
from aiohttp import web, ClientSession, TCPConnector
import ssl
from pympler.tracker import SummaryTracker
PORT = 8888
url = 'https://madis-data.ncep.noaa.gov/madisPublic1/data/archive/2001/07/01/LDAD/mesonet/netCDF/20010701_0800.gz'
# url = 'https://localhost:{}/'.format(PORT)
#url = 'https://localhost/20010701_0800.gz'
async def hello(request):
return web.Response(status=304, content_type='application/octet-stream')
async def track_mem():
tracker = SummaryTracker()
proc = psutil.Process()
while True:
print("Number of open files: {} conns: {}".format(proc.num_fds(), len(proc.connections())))
# tracker.print_diff()
await asyncio.sleep(5)
async def server_startup(app):
ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
ssl_context.set_default_verify_paths()
ssl_context.load_cert_chain('/tmp/cert', '/tmp/key')
connector = TCPConnector(limit=1, conn_timeout=10, ssl_context=ssl_context)
session = ClientSession(connector=connector)
# start 10 workers
for i in range(10):
asyncio.ensure_future(worker(session))
# asyncio.ensure_future(track_mem())
async def worker(session: ClientSession):
loop = asyncio.get_event_loop()
loop.set_debug(True)
while True:
try:
headers = {'If-Modified-Since': 'Sun, 01 Jul 2001 10:24:33 GMT'}
async with session.get(url, headers=headers) as response:
print('.', end='', flush=True)
raise Exception
except:
pass
def main():
app = web.Application()
app.router.add_get('/', hello)
app.on_startup.append(server_startup)
ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
ssl_context.set_default_verify_paths()
ssl_context.load_cert_chain('/tmp/cert', '/tmp/key')
web.run_app(app, port=PORT, ssl_context=ssl_context)
if __name__ == '__main__':
main()
keep track of open files:
watch -n0.5 "lsof | wc -l"
and note how the number of handles grows very quickly. It should remain stable. I was unable to reproduce this with the sample server included so it must be some kind of special behavior of the server this testcase hits.
Your environment
OSX and linux, python3.5.3 with latest aiohttp
per the headers returned it's an apache server, I even tried reproducing with the OSX built-in apache server and it doesn't reproduce, so it seems like it's something particular to this server.
this is in fact related to SSL as the http endpoint ends up redirecting to the https endpoint :)
ends up being a asyncio issue! Here's the testcase:
import asyncio
import ssl
if True: # madis
HOST = 'madis-data.ncep.noaa.gov'
PORT = 443
PATH = '/madisPublic1/data/archive/2001/07/01/LDAD/mesonet/netCDF/20010701_0800.gz'
else: # localhost
HOST = 'localhost'
PORT = 443
PATH = '/20010701_0800.gz'
MSG = 'GET {path} HTTP/1.1\r\nIf-Modified-Since: Sun, 01 Jul 2001 10:24:33 GMT\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nHost: madis-data.ncep.noaa.gov\r\nUser-Agent: Python/3.5 aiohttp/1.3.0a0\r\nContent-Length: 0\r\n\r\n'.format(path=PATH)
class HttpClient(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
self.transport.write(MSG.encode('ascii'))
print('data sent:', MSG)
def data_received(self, data):
print('data received:', data)
asyncio.get_event_loop().call_later(1.0, self.transport.close)
def eof_received(self):
pass
def connection_lost(self, exc):
print('connection lost:', exc)
asyncio.get_event_loop().stop()
async def main():
loop = asyncio.get_event_loop()
ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
ssl_context.set_default_verify_paths()
ssl_context.load_cert_chain('/tmp/cert', '/tmp/key')
conn = await loop.create_connection(HttpClient, HOST, PORT, ssl=ssl_context)
if __name__ == '__main__':
_loop = asyncio.get_event_loop()
_loop.run_until_complete(main())
_loop.run_forever()
will write up a bug a backref here
Could you post response data that you receive in 'data_received'
madis:
data received: b'HTTP/1.1 304 Not Modified\r\nDate: Wed, 01 Feb 2017 01:15:15 GMT\r\nServer: Apache\r\nETag: "19c5ce-31166-38805b7fa9a40"\r\n\r\n'
my server:
data received: b'HTTP/1.1 304 Not Modified\r\nDate: Wed, 01 Feb 2017 01:15:51 GMT\r\nServer: Apache/2.4.23 (Unix) LibreSSL/2.2.7 PHP/5.6.23\r\nETag: "31166-5475d6f656580"\r\n\r\n'
issue 1 should be fixed in master
i submitted patch for issue 2, i am not sure how much time it will take
@thehesiod i added special code for closed ssl transport, in addition to close aiohttp aborts all ssl transport after 2 seconds. this should solve leaked sockets
could you test if it works?
will test asap.
looks like the PR broke master:
Traceback (most recent call last): โ
File "/usr/local/lib/python3.5/asyncio/events.py", line 126, in _run โ658
self._callback(*self._args) โ
File "/usr/local/lib/python3.5/site-packages/aiohttp/connector.py", line 254, in _cleanup_closed โ
self._cleanup_closed_handle.close()
AttributeError: 'TimerHandle' object has no attribute 'close'
should be fixed
MUCH better, yay, thank you!
Just mentioning for others that might be mislead here: they circled back later and disabled this fix by default (see 02b0951).