fastify/fastify-websocket

Node server crashes with ERR_INTERNAL_ASSERTION error

jsmircic opened this issue · 11 comments

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Fastify version

4.x

Plugin version

7.1.3

Node.js version

14.x

Operating system

Linux

Operating system version (i.e. 20.04, 11.3, 10)

Azure AppServices for linux (Node running inside docker container)

Description

We are getting some intermittent container crashes and looking at the logs looks like source of the exception is @fastify/websockets (logs below). We are using @fastify in combination with TRPC subscriptions for real-time communication.

Container exited unexpectedly: last 10 seconds logs [2023-04-17T06:26:19.633055587Z Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
2023-04-17T06:26:19.633108688Z Please open an issue with this stack trace at https://github.com/nodejs/node/issues
2023-04-17T06:26:19.633116188Z 
2023-04-17T06:26:19.633120988Z     at new NodeError (internal/errors.js:322:7)
2023-04-17T06:26:19.633126088Z     at assert (internal/assert.js:14:11)
2023-04-17T06:26:19.633131288Z     at ServerResponse.assignSocket (_http_server.js:225:3)
2023-04-17T06:26:19.633136088Z     at Server. (/usr/src/app/server/dist/node_modules/@fastify/websocket/index.js:52:19)
2023-04-17T06:26:19.633141288Z     at Server.emit (events.js:400:28)
2023-04-17T06:26:19.633145788Z     at Server.emit (domain.js:475:12)
2023-04-17T06:26:19.633149988Z     at onParserExecuteCommon (_http_server.js:703:14)
2023-04-17T06:26:19.633154388Z     at onParserExecute (_http_server.js:616:3) {
2023-04-17T06:26:19.633159188Z   code: 'ERR_INTERNAL_ASSERTION'
2023-04-17T06:26:19.633164488Z }
]

I believe server should not completely crash and restart on such websocket errors. Any help or guidelines on we can resolve this problem. Not sure if this is the right place to raise the issue as the problem could be in the fastify plugin for TRPC? But from the error logs we can only pinpoint to @fastify/websocket.

Thanks

Steps to Reproduce

We are unable to reproduce this issue in our development or staging environments, it is happening only in production.

Expected Behavior

The server should not crash with an ERR_INTERNAL_ASSERTION error.

It is hard to helps if there is no reproducible code or the problem is cloud platform specific.

I understand. Hard to provide reproduction as it's only production environment and we are having lots of clients connected. In all of our testing this never happened.

Looking at stack trace, this line of Node httpserver is failing

https://github.com/nodejs/node/blob/edd64fe5ba20c6c2f53720c672c591f09d9d4ac0/lib/_http_server.js#L225

Don't really have enough knowledge to make sense of what is all happening here. If you have any idea on how we could further investigate please let me know. Thanks

Will it be the environment itself replaced the http.IncommingMessage to others?
For example, it is easily to happen in server-less environment.

I doubt it, as the Node is running inside custom built docker container. And also everything is working ok 99% of the time, sockets are connected and messages are delivered.
Just sometimes this error occurs and server restarts. Had 3 crashes logged yesterday.
Not a huge problem as server restarts fast, but some users do get some failed requests at that point. For that reason would be great to find a solution.

I am think if ._httpMessage is null, it basically means that the request is detached from the socket.
By that meaning, some client aborted the request while we are assigning a empty socket to response.

As I understand socket._httpMessage should be undefined or null in a call to rawResponse.assingSocket(socket). For some reason in our case only sometimes it's not. Maybe the problem is not here but in TRPC fastify integration. But still as this can happen, i'm thinking such case should be handled also here and not end up in crashing the process? Would like to be of more help but my knowledge on websockets internals is limited :(

I think the problem here is that the socket is destroyed before completing the handshake, so there is that error.

I've seen this error myself a few times.

(Note that this is mostly a bug in Node.js core that we will work around here).

So, the problem is not related to socket being destroyed, but to sockets calling assignSocket twice.

@jsmircic could you try and verify if #253 solves it? It fixes my case, but I'm unsure about yours.

I would need a repro.

Tnx, sure, I'll report back with findings. But it's hard for us to create a repro. Happens only when larger number of clients is connected. We couldn't detect any pattern, occurs on random requests.
We did some changes on the TRPc side that might have helped. Didn't have any crashes in last 2 days.

On our side looks like the issue is resolved. Thank you.