Error: write CONNECTION_CLOSED false
mdorda opened this issue · 22 comments
First of all, thank you so much for this amazing library! It is the best postgres driver so far.
We have a service with ~60 req/s to the database. The postgres package is set to default, only connection stuffs (host, port, username, password, dbName, ...) are set. Everything works like a charm, but after more or less 2 hours, small number of errors occur:
Error: write CONNECTION_CLOSED false
at Object.connection (/node-reverse/node_modules/postgres/lib/types.js:191:5)
at close (/node-reverse/node_modules/postgres/lib/connection.js:173:18)
at Socket.onclose (/node-reverse/node_modules/postgres/lib/connection.js:199:5)
at Object.onceWrapper (events.js:286:20)
at Socket.emit (events.js:198:13)
at TCP._handle.close (net.js:607:12)
The number of the errors increases with the time of application runtime. It seems like bad pool handling to me. We are testing it right now with the timeout option, so I will let you know if it helps or not.
Nope, it did not help.
Thanks @mdorda
It's odd that the connection should close sporadically like that for starters.
Timeout does nothing else than close idle connections in the pool if they are unused, so I don't think that would help either if you have ~60 req/s.
Can you share a little about your setup?
Thank for the answer. I found another error in our logs which occures together with the one above:
Error: read ECONNRESET
at TCP.onStreamRead (internal/stream_base_commons.js:111:27)
Our infrastructure
- the database is on separete machine, but at the same cluster
- we use kubernates and dockers
- we use pg package for our another projects and we faced very similar issue, but
statement_timeoutandconnectionTimeoutMillissolved everything - from time to time some docker can lose its connection to the database, but the pool is automatically reconnected (talking about pg package)
How we use your package:
Everytime any docker starts, it fires once:
const sql = postgres({
host: '...',
port: 1234,
database: 'abcd',
username: 'user',
password: 'pass',
timeout: 10,
types: [
... one postgis type definition ...
],
connection: { statement_timeout: 10000 }
})And with every single user request (~ 60 req/s) the similar code to the below one is called:
await sql`
select this, that, etc
from our_table
where ST_Intersects(${sql.types.point([lat, lon])}, way);
`That is all. No update, no insert. Just one select for every each users request.
Hope it is enough. Thank you for your help!
Ok, that error makes sense too if connections are being dropped.
Maybe there is a timeout on the other end for how long each tcp connection can stay open, and that's the reason setting a very low timeout value helps out. The silly thing is that you then incur the connection start cost very often which is also a bad solution (although it does result in you not seeing any errors).
I think the correct solution is to figure out why connections are dropped like that.
What value for timeout are you using with pg ? Note that Postgres.js timeout is in seconds and pg is in milliseconds.
We use only statement_timeout: 10000 and connectionTimeoutMillis: 10000 for pg, nothing more. We will check the other end. Thank you for the suggestion. But I do not think it is the reason, we have 3 clusters (europe, asia and america).
- europe ~60 req/s, the error occurs in ~2h
- asia ~30 req/s, the error occurs in ~4h
- america ~25 req/s, the error occurs in ~5h
Timeout on the other end would result into same times regardless the region.
Oh, that's interesting. So it actually seems related to the number of requests, not the time..
Do you have any trace of anything related in the PostgreSQL logs? Like, are the dropped connections due to the network, or is PostgreSQL closing them?
You could try to add
onnotice: console.log to your options to see if postgres sends anything.
Ok, so I dug a bit deeper, and connectionTimeoutMillis in pg actually sets the PostgreSQL parameter connect_timeout, and isn't related in functionality to my timeout property.
You can try to do
connection: {
statement_timeout: 10000,
connect_timeout: 10000
}
which should then make Postgres.js connections. have the same settings as pg.
It is not possible to set connect_timeout due to unrecognized configuration parameter "connect_timeout" error. It is strange, because I found the same at pg source code. Anyway, according to the postgres documentation, there are basically 2 timeout settings: statement_timeout and idle_in_transaction_session_timeout. Both are set to 10000, but it did not help. The log is still same, even with onnotice: console.log turned on.
In the log there are thousands cases of this error:
{ Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:111:27) errno: 'ECONNRESET', code: 'ECONNRESET', syscall: 'read' }
And right after that:
{ Error: write CONNECTION_CLOSED false
at Object.connection (/node-reverse/node_modules/postgres/lib/types.js:191:5)
at close (/node-reverse/node_modules/postgres/lib/connection.js:173:18)
at Socket.onclose (/node-reverse/node_modules/postgres/lib/connection.js:199:5)
at Object.onceWrapper (events.js:286:20)
at Socket.emit (events.js:198:13)
at TCP._handle.close (net.js:607:12)
Is there any option to programatically reconnect all dead connections?
From a longer observation it seems that it does not depend on the number of requests. More busy clusters show the error earlier than other clusters, but sometimes it takes 30 minutes, sometimes 2 hours... So number of requests just increases the probability of the error.
Ah sorry, It's not a connection parameter, it's an option for libpq..
It seems connectionTimeoutMillis in pg is just a timeout in the connection phase before the connection is ready for queries. I don't understand how these options solved your issue in pg.
Is there any option to programatically reconnect all dead connections?
Connections will automatically reconnect at the next query after the error you got.
Hmm.. Do you have this in a test setup, or are you seeing the issues in production? I'd suggest some things to try, but I'd rather try to replicate the issue if you end up testing in prod :)
It needs load, so it is production issue. We just rewrote it to pg package, just to check if it helps in this specific case or not. But I would prefer your package, so I am eager to get some tips :-)
Ok. cool 🙂
I do have some stability improvements for the next major (currently on master), but there are also a few breaking changes. (bigints from postgres are cast correctly to BigInt or string depending on node version support)
While looking into your issue here, I also found an issue with errors not being thrown correctly in some instances, so you could try out what's on master currently.
Just to inform you, pg version works without any (non-performance) problem. I will try the master version next week and let you know. Thank you!
I spent some time tonight implementing the connect_timeout parameter, and at the same time I found an issue that could cause stale connections. It would be really nice if you'd give master a try using eg. connect_timeout: 10 (it's currently 30 seconds as default.)
Hi, thank you so much for the info. I am quite busy this week. I will let you know next week. I have everything prepared in a branch, I just have not had time to publish it and watch it in production :-)
Ah that's awesome ;)
Be sure to test out things first as there are a few breaking changes in master. Mainly bigint is returned as string now instead of unsafely casting to int.
I still think about it, thanks for your patience :-)
It looks it is working now! Thank you for the fix.
I'm also encountering something similar with a PostgreSQL service on Render.com, but with a very low number of connections.
I've documented the issue here: https://community.render.com/t/econnreset-with-node-js-and-postgresql/1498
In case this doesn't go anywhere soon, I may create a new issue here too.