`.stop()` sometimes hangs indefinitely
marcus-pousette opened this issue · 8 comments
-
Version:
1.8.1 -
Subsystem:
@libp2p/websocket , connection manager, components
Severity:
High
Description:
Steps to reproduce the error:
I have a test scenario with a few nodes that I connect to each other and send some data
Sometimes when terminating a node it endlessly waits for the Websocket transport to shutdown. This is because there are connections that does close (?) and in the it-ws
library there are no calls to server.closeAllConnections()
before close
.
I have managed to make the shutdown work better if modify
to terminate the components in the reverse order in sequentially... so I suspect there is some kind of race condition going on or leak of some sortAlso no issues if I use Tcp transport instead.
This is not a good description on how to reproduce the issue, because it is a bit random when it occurs. But wanted to create this issue if others have the same problem and will update the description if I have a isolated a good test
When the stop timeouts, I have one connection left with 0 streams,
remoteAddr: '/ip6/::ffff:7f00:1/tcp/56327/p2p/12D3KooWApN6mEB5666wTpqpzkgcKXC7kgMxmm7CAWRiD3WAg4Yv'
direction: 'inbound'
encryption: '/noise'
multiplexer: '/yamux/1.0.0'
transient: false
Closing the connectionManager
component before all other components seems also to fix the issue
async _invokeStartableMethod(methodName) {
const timeout = new Promise((resolve, reject) => {
setTimeout(() => {
reject('timeout')
}, 10000);
});
const connectionManager = this.components["connectionManager"]
if (isStartable(connectionManager)) {
await connectionManager[methodName]?.();
}
let promise = Promise.all(Object.values(this.components)
.filter(obj => isStartable(obj))
.map(async (startable) => {
await startable[methodName]?.();
}));
const race = Promise.race([promise, timeout]);
try {
await race;
}
catch (e) {
throw e;
}
finally {
clearTimeout(timeout);
}
}
in the it-ws library there are no calls to server.closeAllConnections() before close.
I may be missing something but where is the closeAllConnections
method defined? The it-ws
module stores a reference to a http
/https
net.Server which has no such method.
no issues if I use Tcp transport
Looking at the code, the TCP listener aborts each connection on close whereas the WebSocket transport tries to close them gracefully.
Do you still see the issue if the transport aborts the connections instead of closing them on close?
http.Server
and https.Server
has that method since NodeJs 18.2.0
/// it-ws/server.ts
async close (): Promise<void> {
await new Promise<void>((resolve, reject) => {
this.server.closeAllConnections()
this.server.close((err) => {
if (err != null) {
reject(err); return
}
resolve()
})
})
}
Actually the closeAllConnections trick does work since the connection is a tcp
one (?)
The close connection manager first trick seems to always work.
Also adding like a 5s delay before stopping also seems to work.
It feels like there is some kind of connection creation in process that is not caught correctly on closing
http.Server
andhttps.Server
has that method since NodeJs 18.2.0
Aha, that's interesting, nice find.
The docs do say though:
This does not destroy sockets upgraded to a different protocol, such as WebSocket or HTTP/2
...so I wonder if there's something else going on here?
Ye the whole thing about aborting Websocket connections before closing does not seem to have an impact. Only the two things I mentioned in my last comment did mitigate the problem for me
Looking at the code, the TCP listener aborts each connection on close whereas the WebSocket transport tries to close them gracefully.
That would explain something!