Error: write CONNECTION_CLOSED false

Question

Error: write CONNECTION_CLOSED false

mdorda opened this issue 6 years ago · 22 comments

First of all, thank you so much for this amazing library! It is the best postgres driver so far.

We have a service with ~60 req/s to the database. The postgres package is set to default, only connection stuffs (host, port, username, password, dbName, ...) are set. Everything works like a charm, but after more or less 2 hours, small number of errors occur:

Error: write CONNECTION_CLOSED false
    at Object.connection (/node-reverse/node_modules/postgres/lib/types.js:191:5)
    at close (/node-reverse/node_modules/postgres/lib/connection.js:173:18)
    at Socket.onclose (/node-reverse/node_modules/postgres/lib/connection.js:199:5)
    at Object.onceWrapper (events.js:286:20)
    at Socket.emit (events.js:198:13)
    at TCP._handle.close (net.js:607:12)

The number of the errors increases with the time of application runtime. It seems like bad pool handling to me. We are testing it right now with the timeout option, so I will let you know if it helps or not.

Answer 1 · 2020-03-09T18:44:11.000Z

Nope, it did not help.

Answer 2 · 2020-03-09T19:35:42.000Z

Thanks @mdorda

It's odd that the connection should close sporadically like that for starters.

Timeout does nothing else than close idle connections in the pool if they are unused, so I don't think that would help either if you have ~60 req/s.

Can you share a little about your setup?

Answer 3 · 2020-03-09T20:10:24.000Z

Thank for the answer. I found another error in our logs which occures together with the one above:

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:111:27)

Our infrastructure

the database is on separete machine, but at the same cluster
we use kubernates and dockers
we use pg package for our another projects and we faced very similar issue, but statement_timeout and connectionTimeoutMillis solved everything
from time to time some docker can lose its connection to the database, but the pool is automatically reconnected (talking about pg package)

How we use your package:

Everytime any docker starts, it fires once:

const sql = postgres({
    host: '...',
    port: 1234,
    database: 'abcd',
    username: 'user',
    password: 'pass',
    timeout: 10,
    types: [
        ... one postgis type definition ...
    ],
    connection: { statement_timeout: 10000 }
})

And with every single user request (~ 60 req/s) the similar code to the below one is called:

await sql`
    select this, that, etc
    from our_table
    where ST_Intersects(${sql.types.point([lat, lon])}, way);
`

That is all. No update, no insert. Just one select for every each users request.

Hope it is enough. Thank you for your help!

Answer 4 · 2020-03-09T20:30:08.000Z

Ok, that error makes sense too if connections are being dropped.

Maybe there is a timeout on the other end for how long each tcp connection can stay open, and that's the reason setting a very low timeout value helps out. The silly thing is that you then incur the connection start cost very often which is also a bad solution (although it does result in you not seeing any errors).

I think the correct solution is to figure out why connections are dropped like that.

What value for timeout are you using with pg ? Note that Postgres.js timeout is in seconds and pg is in milliseconds.

Answer 5 · 2020-03-09T20:38:47.000Z

We use only statement_timeout: 10000 and connectionTimeoutMillis: 10000 for pg, nothing more. We will check the other end. Thank you for the suggestion. But I do not think it is the reason, we have 3 clusters (europe, asia and america).

europe ~60 req/s, the error occurs in ~2h
asia ~30 req/s, the error occurs in ~4h
america ~25 req/s, the error occurs in ~5h

Timeout on the other end would result into same times regardless the region.

Answer 6 · 2020-03-09T21:28:46.000Z

Oh, that's interesting. So it actually seems related to the number of requests, not the time..

Do you have any trace of anything related in the PostgreSQL logs? Like, are the dropped connections due to the network, or is PostgreSQL closing them?

You could try to add
onnotice: console.log to your options to see if postgres sends anything.

Answer 7 · 2020-03-09T21:36:06.000Z

Ok, so I dug a bit deeper, and connectionTimeoutMillis in pg actually sets the PostgreSQL parameter connect_timeout, and isn't related in functionality to my timeout property.

You can try to do

connection: { 
  statement_timeout: 10000,
  connect_timeout: 10000
}

which should then make Postgres.js connections. have the same settings as pg.

Answer 8 · 2020-03-10T07:41:01.000Z

It is not possible to set connect_timeout due to unrecognized configuration parameter "connect_timeout" error. It is strange, because I found the same at pg source code. Anyway, according to the postgres documentation, there are basically 2 timeout settings: statement_timeout and idle_in_transaction_session_timeout. Both are set to 10000, but it did not help. The log is still same, even with onnotice: console.log turned on.

In the log there are thousands cases of this error:

{ Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:111:27) errno: 'ECONNRESET', code: 'ECONNRESET', syscall: 'read' }

And right after that:

{ Error: write CONNECTION_CLOSED false
    at Object.connection (/node-reverse/node_modules/postgres/lib/types.js:191:5)
    at close (/node-reverse/node_modules/postgres/lib/connection.js:173:18)
    at Socket.onclose (/node-reverse/node_modules/postgres/lib/connection.js:199:5)
    at Object.onceWrapper (events.js:286:20)
    at Socket.emit (events.js:198:13)
    at TCP._handle.close (net.js:607:12)

Is there any option to programatically reconnect all dead connections?

Answer 9 · 2020-03-10T10:49:42.000Z

From a longer observation it seems that it does not depend on the number of requests. More busy clusters show the error earlier than other clusters, but sometimes it takes 30 minutes, sometimes 2 hours... So number of requests just increases the probability of the error.

Answer 10 · 2020-03-10T10:51:30.000Z

Ah sorry, It's not a connection parameter, it's an option for libpq..

It seems connectionTimeoutMillis in pg is just a timeout in the connection phase before the connection is ready for queries. I don't understand how these options solved your issue in pg.

Is there any option to programatically reconnect all dead connections?

Connections will automatically reconnect at the next query after the error you got.

Answer 11 · 2020-03-10T10:53:53.000Z

Hmm.. Do you have this in a test setup, or are you seeing the issues in production? I'd suggest some things to try, but I'd rather try to replicate the issue if you end up testing in prod :)

Answer 12 · 2020-03-10T11:00:48.000Z

It needs load, so it is production issue. We just rewrote it to pg package, just to check if it helps in this specific case or not. But I would prefer your package, so I am eager to get some tips :-)

Answer 13 · 2020-03-10T11:25:16.000Z

Ok. cool 🙂

I do have some stability improvements for the next major (currently on master), but there are also a few breaking changes. (bigints from postgres are cast correctly to BigInt or string depending on node version support)

While looking into your issue here, I also found an issue with errors not being thrown correctly in some instances, so you could try out what's on master currently.

Answer 14 · 2020-03-10T12:32:16.000Z

Just to inform you, pg version works without any (non-performance) problem. I will try the master version next week and let you know. Thank you!

Answer 15 · 2020-04-06T21:04:48.000Z

Hey @mdorda . Did you have any chance of trying out master?

Answer 16 · 2020-04-07T00:03:34.000Z

I spent some time tonight implementing the connect_timeout parameter, and at the same time I found an issue that could cause stale connections. It would be really nice if you'd give master a try using eg. connect_timeout: 10 (it's currently 30 seconds as default.)

Answer 17 · 2020-04-07T12:59:52.000Z

Hi, thank you so much for the info. I am quite busy this week. I will let you know next week. I have everything prepared in a branch, I just have not had time to publish it and watch it in production :-)

Answer 18 · 2020-04-08T15:25:10.000Z

Ah that's awesome ;)

Be sure to test out things first as there are a few breaking changes in master. Mainly bigint is returned as string now instead of unsafely casting to int.

Answer 19 · 2020-04-21T08:38:49.000Z

I still think about it, thanks for your patience :-)

Answer 20 · 2020-05-04T06:42:32.000Z

It looks it is working now! Thank you for the fix.

Answer 21 · 2021-04-27T21:54:49.000Z

I'm also encountering something similar with a PostgreSQL service on Render.com, but with a very low number of connections.

I've documented the issue here: https://community.render.com/t/econnreset-with-node-js-and-postgresql/1498

In case this doesn't go anywhere soon, I may create a new issue here too.

Answer 22 · 2021-04-28T05:51:32.000Z

Opened #179 since it didn't go away overnight.

I've tried changing the connect_timeout to 10 to see if this helps.