leafo/pgmoon

Random disconnects

dridri opened this issue · 4 comments

Hi, I oftenly got this error :

/usr/share/lua/5.3/pgmoon/init.lua:600: attempt to index a nil value (field 'sock')
Stack trace :
     in ??(self, t, data, len) at /usr/share/lua/5.3/pgmoon/init.lua:600
     in q(self, q) at /usr/share/lua/5.3/pgmoon/init.lua:294

It seems that pgmoon randomly disconnects from the PSQL server, I'm running it standalone (without Resty). I tried dumb fixes such as setTimeout(-1), and trying to "ping" the database every 30seconds, but still not enough.

I found out that the socket is garbage collected under heavy load, hooking __gc of pgresult.sock.sock is enough to prevent it. So it seems to be related to luasocket, but not 100% sure.

leafo commented

I'm closing this as it appears to be a luasocket bug. I recommend using cqueues if possible as your network socket.

I found out that the socket is garbage collected under heavy load, hooking __gc of pgresult.sock.sock is enough to prevent it.

Interesting, seeing something that appears to be the same problem, using HAProxy Socket class... how does one hook __gc, do you mean overload that function definition?

Initially pg connects fine and even waits for a notification, but then begins to error:

[kubernetes-ingress-controller] Init new socket of type:
[kubernetes-ingress-controller] haproxy
[kubernetes-ingress-controller] HAProxy class Socket    table
[kubernetes-ingress-controller] LISTEN test
[kubernetes-ingress-controller] Checking for async postgres notification every 10000 ms
[kubernetes-ingress-controller]
[kubernetes-ingress-controller] Waiting for notifications...
[kubernetes-ingress-controller] nil
...
[kubernetes-ingress-controller] Waiting for notifications...
[kubernetes-ingress-controller] Lua task: runtime error: /usr/local/share/lua/5.3/pgmoon/init.lua:740: attempt to index a nil value (field 'sock') from /usr/local/share/lua/5.3/pgmoon/init.lua:740: in method 'receive_message', /usr/local/share/lua/5.3/pgmoon/init.lua:559: in method 'wait_for_notification', /usr/local/share/lua/5.3/auth.lua:176: in function line 150.

My code:

  1. Creates a new pg connection instance via pgmoon.new
  2. Queries LISTEN test; to listen for notifications (pgnotify)
  3. Loops, calling pg:wait_for_notification() in each iteration of the loop

Here is my Lua code (runs in HAProxy):

function listen()
    print(pg_settings)
    local pg = pgmoon.new(pg_settings)
    pg:settimeout(5000) -- Sets the timeout value (in milliseconds) for all socket operations (connect, write, receive)
    pg.convert_null = true -- https://github.com/leafo/pgmoon#converting-nulls
    local success, err = pg:connect()
    if not success then
        local msg = string.format(
                'Error connecting (%s) host=%s db=%s user=%s',
                err,
                pg_settings['host'], pg_settings['database'], pg_settings['user']
        );
        core.log(core.err, msg)
        reply_503(txn)
    end

    -- query database for auth info for the token
    local interval = 10000
    local channel = 'test'
    local query = string.format("LISTEN %s", channel)
    print(query)
    pg:query(query)
    print(string.format("Checking for async postgres notification every %s ms\n", interval))
    while true do
        print('Waiting for notifications...')
        local n = pg:wait_for_notification() -- store result in some table
        print(dump(n))
        core.msleep(interval)
    end
end

core.register_task(listen)

Anything obviously wrong about this?

My issue was (you can see in the code above) that my timeout was 5000 ms while the loop interval was 10000ms so naturally the socket was closed after one iteration of the loop. Sorry for any noise/confusion!