Pool fails to reconnect when using Unix-domain socket (OperationalError: server closed the connection unexpectedly)
Closed this issue · 18 comments
If I use momoko.Pool to connect over TCP, it handles database disconnects and restarts perfectly. But if I connect over a Unix-domain socket, it does not. The first request after a disconnect crashes with "OperationalError: server closed the connection unexpectedly".
Here's an example script:
$ cat lostconn.py
#!/usr/bin/python
from __future__ import print_function
import sys
import subprocess
from tornado import gen, ioloop
import momoko
# what happens when momoko is connected to a database and the connection is
# killed? should recover, but doesn't if connection is over unix-domain
# socket
@gen.coroutine
def test(dsn, io_loop):
pool = momoko.Pool(dsn=dsn, size=1, ioloop=io_loop)
yield pool.connect()
print('pool:', pool)
cur = yield pool.execute('select pg_backend_pid()')
pid = cur.fetchall()[0][0]
print('pid 1:', pid)
cur.close()
sql = 'select pg_terminate_backend(%s)' % pid
cmd = ['psql', '-q', '-o/dev/null', 'momoko', '-c', sql]
print('running:', subprocess.list2cmdline(cmd))
subprocess.check_call(cmd)
cur = yield pool.execute('select pg_backend_pid()')
print('pid 2:', cur.fetchall()[0][0])
cur.close()
pool.close()
def main():
dsn = sys.argv[1]
io_loop = ioloop.IOLoop.instance()
future = test(dsn, io_loop)
io_loop.add_future(future, lambda x: io_loop.stop())
io_loop.start()
main()
I'm testing with:
- Ubuntu 15.10 (wily)
- Python 2.7.10 (ubuntu package)
- PostgreSQL 9.4.8 (ubuntu package)
- psycopg2 2.6.1 (ubuntu package)
- momoko git master (678f494 from 2016-06-15)
I've set things up with a momoko
database that I can login to without authentication, either via TCP or Unix-domain socket:
$ psql -h localhost momoko -c 'select pg_backend_pid()'
pg_backend_pid
----------------
15413
$ psql -h /var/run/postgresql momoko -c 'select pg_backend_pid()'
pg_backend_pid
----------------
15428
First, let's run the test script in "happy" mode, demonstrating that momoko successfully reconnects when using a TCP socket:
$ ./lostconn.py 'host=localhost dbname=momoko'
pool: <momoko.connection.Pool object at 0x7f6c42633550>
pid 1: 15434
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(15434)"
pid 2: 15447
Good!
Now try the same thing via Unix-domain socket:
$ ./lostconn.py 'host=/var/run/postgresql dbname=momoko sslmode=disable'
pool: <momoko.connection.Pool object at 0x7f367e723550>
pid 1: 15470
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(15470)"
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "./lostconn.py", line 32, in test
cur = yield pool.execute('select pg_backend_pid()')
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/data/src/momoko/momoko/connection.py", line 462, in when_available
future_or_result = method(conn, *args, **kwargs)
File "/home/data/src/momoko/momoko/connection.py", line 761, in execute
cursor.execute(operation)
OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I've seen the same behaviour with v2.2.1, v2.2.2, v2.2.3, and current git master.
Of possible interest: the failure isn't absolutely reliable. I did a couple of trials of 100 runs each, and I get the above failure ~70% of the time, but success ~30% of the time. Tried it with both pool size=1
and size=5
: similar result.
Oh yeah again: you can't see it in my little repro script, but this bug puts Pool in a bad state. Eg. in a long-running webapp, no database queries work after the disconnect. Everything fails immediately with that OperationalError, and momoko never recovers. Only workaround appears to be restarting the app.
Can you please rerun your example with debug enabled -logging.getLogger("momoko").setLevel(logging.DEBUG)
- and post the output of the failing section?
$ ./lostconn.py 'host=/var/run/postgresql dbname=momoko'
DEBUG:momoko:Opening new connection
DEBUG:momoko:Handling free connection 7
DEBUG:momoko:No outstanding requests - adding to free pool
DEBUG:momoko:All initial connection requests complete
pool: <momoko.connection.Pool object at 0x7f2f1c72c0d0>
DEBUG:momoko:Acquired free connection 7
DEBUG:momoko:Obtained connection: 7
DEBUG:momoko:About to release connection 7
DEBUG:momoko:The connection is alive
DEBUG:momoko:Handling free connection 7
DEBUG:momoko:No outstanding requests - adding to free pool
pid 1: 21048
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(21048)"
DEBUG:momoko:Acquired free connection 7
DEBUG:momoko:Obtained connection: 7
DEBUG:momoko:Method failed synchronously
DEBUG:momoko:About to release connection 7
DEBUG:momoko:The connection is alive
DEBUG:momoko:Handling free connection 7
DEBUG:momoko:No outstanding requests - adding to free pool
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "./lostconn.py", line 33, in test
cur = yield pool.execute('select pg_backend_pid()')
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/data/src/momoko/momoko/connection.py", line 462, in when_available
future_or_result = method(conn, *args, **kwargs)
File "/home/data/src/momoko/momoko/connection.py", line 761, in execute
cursor.execute(operation)
OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Another check please - can you change momoko's code to log the value of conn.closed
here?
It should be True
, otherwise it's a bug in psycopg2 (because if momoko gets error, and connection is alive, then I assume it's a ProgrammingError, or something else that is not appropriate to retry on Momoko's level).
conn.closed
is false in _retry()
.
$ git diff
diff --git a/momoko/connection.py b/momoko/connection.py
index bcee634..116b57c 100644
--- a/momoko/connection.py
+++ b/momoko/connection.py
@@ -492,6 +492,7 @@ class Pool(object):
return future
def _retry(self, retry, what, conn, keep, future):
+ log.debug('Connection._retry(): conn.closed = %r (retry = %r, what = %r, conn = %r, keep = %r, future = %r)', conn.closed, retry, what, conn, keep, future)
if conn.closed:
if not retry:
retry.append(conn)
$ ./lostconn.py 'host=/var/run/postgresql dbname=momoko'
DEBUG:momoko:Opening new connection
DEBUG:momoko:Handling free connection 7
DEBUG:momoko:No outstanding requests - adding to free pool
DEBUG:momoko:All initial connection requests complete
pool: <momoko.connection.Pool object at 0x7f700a338c10>
DEBUG:momoko:Acquired free connection 7
DEBUG:momoko:Obtained connection: 7
DEBUG:momoko:About to release connection 7
DEBUG:momoko:The connection is alive
DEBUG:momoko:Handling free connection 7
DEBUG:momoko:No outstanding requests - adding to free pool
pid 1: 21709
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(21709)"
DEBUG:momoko:Acquired free connection 7
DEBUG:momoko:Obtained connection: 7
DEBUG:momoko:Method failed synchronously
DEBUG:momoko:Connection._retry(): conn.closed = False (retry = [], what = <function when_available at 0x7f700a343578>, conn = <momoko.connection.Connection object at 0x7f700a338d10>, keep = False, future = <tornado.concurrent.Future object at 0x7f700f0e0cd0>)
DEBUG:momoko:About to release connection 7
DEBUG:momoko:The connection is alive
DEBUG:momoko:Handling free connection 7
DEBUG:momoko:No outstanding requests - adding to free pool
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "./lostconn.py", line 33, in test
cur = yield pool.execute('select pg_backend_pid()')
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/data/src/momoko/momoko/connection.py", line 462, in when_available
future_or_result = method(conn, *args, **kwargs)
File "/home/data/src/momoko/momoko/connection.py", line 762, in execute
cursor.execute(operation)
OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
So, to clarify, this is a psycopg2 bug? What exactly is the bug? Reporting errors differently over TCP socket and Unix-domain socket?
Yes, exactly. The exception says that "server closed the connection...", but the connection object states that it's not closed. And this happens only sometimes and only over Unix-domain sockets.
I remember back in early days of Momoko, the .closed
properly did not work on Unix sockets at all in async mode. I reported a bug on that. It was fixed but not fully, as we can see now. You are welcome to report new bug/reopen the existing one to psycopg2
. If you do, I'll appreciate if you reference it here.
I'm closing the issue since there is nothing really I can do on the subject rather to advise you to use TCP.
Feel free to disagree/reopen.
I can't reproduce a problem with psycopg2. However, its connection object's closed
attr is set to 2. Is momoko doing a closed == 1
somewhere, and incorrectly turning psycopg2 conn.closed == 2
to momoko conn.closed == False
?
Anyways, here is my psycopg2 test script:
$ cat lostconn.py
#!/usr/bin/python
from __future__ import print_function
import sys
import logging
import select
import psycopg2
import psycopg2.extras
log = logging.getLogger()
print(psycopg2)
print(psycopg2.__version__)
def main():
logging.basicConfig(level=logging.DEBUG)
dsn = sys.argv[1]
conn = psycopg2.connect(dsn, async=True)
print('conn:', conn) #, 'fd:', conn.fileno())
psycopg2.extras.wait_select(conn)
with conn.cursor() as cur:
cur.execute('select pg_backend_pid()')
psycopg2.extras.wait_select(conn)
pid = cur.fetchall()[0][0]
print('pid:', pid)
cur.execute('select pg_terminate_backend(pg_backend_pid())')
try:
psycopg2.extras.wait_select(conn)
cur.fetchall()
except psycopg2.DatabaseError as err:
log.info('as expected, caught error reading query result: %s %s', type(err), err)
print('self-terminated connection:', conn)
assert conn.closed, 'closed: %r' % (conn.closed,)
main()
I'm using the exact same setup as described before: specifically, psycopg2 is the ubuntu package of 2.6.1. Output using TCP socket:
./lostconn.py 'host=localhost dbname=momoko sslmode=disable'
<module 'psycopg2' from '/usr/lib/python2.7/dist-packages/psycopg2/__init__.pyc'>
2.6.1 (dt dec mx pq3 ext lo64)
conn: <connection object at 0x7fcd11cee280; dsn: 'host=localhost dbname=momoko sslmode=disable', closed: 0>
pid: 23610
INFO:root:as expected, caught error reading query result: <class 'psycopg2.DatabaseError'> server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
self-terminated connection: <connection object at 0x7fcd11cee280; dsn: 'host=localhost dbname=momoko sslmode=disable', closed: 2>
Output using Unix-domain socket:
$ ./lostconn.py 'host=/var/run/postgresql dbname=momoko'
<module 'psycopg2' from '/usr/lib/python2.7/dist-packages/psycopg2/__init__.pyc'>
2.6.1 (dt dec mx pq3 ext lo64)
conn: <connection object at 0x7f1ce8fa5280; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
pid: 23615
INFO:root:as expected, caught error reading query result: <class 'psycopg2.DatabaseError'> server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
self-terminated connection: <connection object at 0x7f1ce8fa5280; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 2>
Here is the .closed
property logic:
@property
def closed(self):
"""
Indicates whether the connection is closed or not.
"""
# 0 = open, 1 = closed, 2 = 'something horrible happened'
return self.connection.closed > 0 if self.connection else True
The scenario you are reproducing with pure psycopg2
is different. In your original test you kill server using shell, and then your code fails syncrhonously on cur.execute
, without entering event loop at all. In your pure psycopg2
however, server goes away while you are waiting in the event loop, so this is a different scenario.
Can you update your latter test with killing the server the same way you did before and check again? - If I'm right, your code should raise even before entering try/except clause.
The scenario you are reproducing with pure psycopg2 is different. In your original test you kill server using shell
[...]
Can you update your latter test with killing the server the same way you did before
Oops! My bad. I tried it... same result.
Here is my revised psycopg2 test script:
#!/usr/bin/python
from __future__ import print_function
import sys
import logging
import select
import subprocess
import psycopg2
import psycopg2.extras
log = logging.getLogger()
print(psycopg2)
print(psycopg2.__version__)
def main():
logging.basicConfig(level=logging.DEBUG)
dsn = sys.argv[1]
conn = psycopg2.connect(dsn, async=True)
print('conn:', conn) #, 'fd:', conn.fileno())
psycopg2.extras.wait_select(conn)
with conn.cursor() as cur:
cur.execute('select pg_backend_pid()')
psycopg2.extras.wait_select(conn)
pid = cur.fetchall()[0][0]
print('pid 1:', pid)
sql = 'select pg_terminate_backend(%s)' % pid
cmd = ['psql', '-q', '-o/dev/null', 'momoko', '-c', sql]
print('running:', subprocess.list2cmdline(cmd))
subprocess.check_call(cmd)
cur.execute('select pg_backend_pid()')
try:
psycopg2.extras.wait_select(conn)
pid = cur.fetchall()[0][0]
print('pid 2:', pid)
except psycopg2.DatabaseError as err:
log.info('as expected, caught error reading query result: %s %s', type(err), err)
print('terminated connection:', conn)
assert conn.closed, 'closed: %r' % (conn.closed,)
main()
Output using TCP socket:
./lostconn.py 'host=localhost dbname=momoko sslmode=disable'
<module 'psycopg2' from '/usr/lib/python2.7/dist-packages/psycopg2/__init__.pyc'>
2.6.1 (dt dec mx pq3 ext lo64)
conn: <connection object at 0x7f1f9205eb40; dsn: 'host=localhost dbname=momoko sslmode=disable', closed: 0>
pid 1: 24629
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(24629)"
INFO:root:as expected, caught error reading query result: <class 'psycopg2.DatabaseError'> server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
terminated connection: <connection object at 0x7f1f9205eb40; dsn: 'host=localhost dbname=momoko sslmode=disable', closed: 2>
Output using Unix-domain socket:
./lostconn.py 'host=/var/run/postgresql dbname=momoko'
<module 'psycopg2' from '/usr/lib/python2.7/dist-packages/psycopg2/__init__.pyc'>
2.6.1 (dt dec mx pq3 ext lo64)
conn: <connection object at 0x7f92308dbb40; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
pid 1: 24647
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(24647)"
INFO:root:as expected, caught error reading query result: <class 'psycopg2.DatabaseError'> server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
terminated connection: <connection object at 0x7f92308dbb40; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 2>
There must me some difference between the way mokoko example works and psycopg2 stand-alone. Because in mokoko output there is "Method failed synchronously" message, meaning that execute
call raised even before entering to event loop. I.e. the scenario is not exactly the same.
So far the difference I see is that synthetic example reuses the same cursor object through context manager, while Momoko creates new cursor object for every execute call
as written here.
Can you please change your sample to obtain new cursor object before each execute
call?
Can you please change your sample to obtain new cursor object before each execute call?
Sure. I just tried it, and it made no difference: the psycopg2 connection object's closed
attr is 2 every time.
I'm going to take a different approach: start with my momoko example, copy the necessary bits of momoko into my example script, and then rip out as much code as I can while still seeing the incorrect closed
attr.
BTW, I can't reopen this bug. Would appreciate if you could reopen until I can clearly point to a problem with psycopg2.
For the record, here is a repro script that uses momoko but focuses on the closed
attribute. It clearly demonstrates inconsistent results coming from psycopg2.
$ cat badclosed.py
#!/usr/bin/python
from __future__ import print_function
import sys
import logging
import subprocess
import psycopg2
from tornado import gen, ioloop
import momoko
# if a connection is closed externally, its 'closed' attr should always be true
# (because psycopg2 should always set its 'closed' attr to non-zero)
@gen.coroutine
def test(dsn):
conn = momoko.Connection(dsn=dsn)
yield conn.connect()
print('psycopg2 conn:', conn.connection)
cur = yield conn.execute('select pg_backend_pid()')
pid = cur.fetchall()[0][0]
print('pid 1:', pid)
cur.close()
sql = 'select pg_terminate_backend(%s)' % pid
cmd = ['psql', '-q', '-o/dev/null', 'momoko', '-c', sql]
print('running:', subprocess.list2cmdline(cmd))
subprocess.check_call(cmd)
print('before trying another query: psycopg2 conn:', conn.connection)
try:
cur = yield conn.execute('select pg_backend_pid()')
except psycopg2.DatabaseError as err:
print('as expected, got %s: %s' % (err.__class__.__name__, err))
print('after trying another query: psycopg2 conn: ', conn.connection)
assert conn.connection.closed == 2, conn.connection.closed
assert conn.closed, conn.closed
def main():
dsn = sys.argv[1]
logging.basicConfig()
logging.getLogger("momoko").setLevel(logging.DEBUG)
io_loop = ioloop.IOLoop.instance()
future = test(dsn)
io_loop.add_future(future, lambda x: io_loop.stop())
io_loop.start()
main()
Run it once on a Unix-domain socket:
$ ./badclosed.py 'host=/var/run/postgresql dbname=momoko'
psycopg2 conn: <connection object at 0x7fce7b069e88; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
pid 1: 9792
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(9792)"
before trying another query: psycopg2 conn: <connection object at 0x7fce7b069e88; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
as expected, got OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
after trying another query: psycopg2 conn: <connection object at 0x7fce7b069e88; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 879, in run
yielded = self.gen.send(value)
File "./badclosed.py", line 39, in test
assert conn.connection.closed == 2, conn.connection.closed
AssertionError: 0
Run it again, and get a different result:
$ ./badclosed.py 'host=/var/run/postgresql dbname=momoko'
psycopg2 conn: <connection object at 0x7f2c0d36ae88; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
pid 1: 9811
running: psql -q -o/dev/null momoko -c "select pg_terminate_backend(9811)"
before trying another query: psycopg2 conn: <connection object at 0x7f2c0d36ae88; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 0>
as expected, got DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
after trying another query: psycopg2 conn: <connection object at 0x7f2c0d36ae88; dsn: 'host=/var/run/postgresql dbname=momoko', closed: 2>
Yeah, so there is definitely some fuzziness in psycopg2. I'm sure you'll catch it up. Sorry for not providing more assistance - pretty swamped with other things recently.
OK I have a very clear reproduction script in psycopg2, and I found a fix. No PR for psycopg2 yet, but the bug is definitely there: psycopg/psycopg2#443.
This bug is definitely not momoko's fault. @haizaar, thanks for your help in debugging this!
Sure! Kudos on the thorough research.
Extra confirmation: I just retried my two repro scripts that appeared to show a problem with momoko (lostconn.py and badclosed.py). I can still reproduce the problem using latest released psycopg2, but with my patch (psycopg/psycopg2#443), momoko behaves perfectly.