nodejs/node

Passing fd between threads causes VM to crash, or EBADF

Opened this issue · 4 comments

  • Version: 12.13.0
  • Platform: Linux x 5.0.0-36-generic #39-Ubuntu SMP Tue Nov 12 09:46:06 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: net, worker_threads

From forks to threads

I was working on moving from forks to Worker Threads and was experimenting with passing file descriptors between threads. Forks work great with passing file descriptors around due to the extended IPC, making it possible to share open ports.

Sharing file descriptors

Using Worker Threads, this becomes impossible, For that reason, I had to create a master thread handling connection events, and pass those connections to worker threads through postMessage. Since the whole object cannot be passed through message, I thought the lightest and fastest way to do this would be to post the fd as a message, and have the worker thread create a new Socket using the fd.

It works great until it does not. It is really unpredictable, but always seems to crash at some point.

Repro steps

Here is the smallest portion of code I could come up with to reproduce the issue. Those are two files : master.js and worker.js.

Full test HERE.

// master.js
const { Worker } = require('worker_threads');
const net = require('net');

const workers = new Worker("./worker.js");

const server = net.createServer(conn => {
    conn.unref();
    workers.postMessage({ duplex_fd : conn._handle.fd });
});

server.listen(12345);
// worker.js
const { Socket } = require('net');

require('worker_threads').parentPort.on('message', (msg) => {
    const sock = new Socket({ fd : msg.duplex_fd, readable : true, writable : true, allowHalfOpen : true });  
    sock.end("Hello, World", () => {
        sock.destroy();
    });
})

After an unpredictable while, I get either one of those two errors :

node: ../deps/uv/src/unix/core.c:930: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.
Aborted (core dumped)

or

events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: read EBADF
    at TCP.onStreamRead (internal/stream_base_commons.js:201:27)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (internal/streams/destroy.js:92:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'EBADF',
  code: 'EBADF',
  syscall: 'read'
}

This is something I used to do in C++ : have a master thread handle incoming connections, and pass the fd integer to whatever thread is available. Maybe I'm misunderstanding how Nodejs handles file descriptors in the background?

The full example I wrote had a worker pool and sometimes was able to handle over 5000 requests before crashing. The crashes are random.

If it can help, here is the stress.js file I used to conduct the tests.

// stress.js
const net = require('net');
const cluster = require('cluster');

if (cluster.isMaster) {
    let reqSent = 0;
    for (let i = 0; i < 10; i++) cluster.fork().on('message', m => m == "+" && console.log(reqSent++));
} else {
    const sendReq = () => {
        const sock = net.connect(12345, 'localhost', () => {
            sock.write("Hello?", () => {
                sock.end();
                sock.destroy();
                process.send("+");
                setImmediate(() => sendReq());
            });
        });
    };
    sendReq();
}

Let me know if you need more info, or if I simply misunderstand how to use this feature.

Notes

This also happens with file streams, and sockets on top of HTTP.

Ah. After further research, I found the following note in the worker threads documentation :

Unlike with child processes, transferring handles such as network sockets is currently not supported.

Was the NodeJS team planning on implementing this feature similar to how you can send a handle using subprocess.send ? Is there a way around it using C++ or an experimental feature?

Fwiw, the reason for the crashes would most likely be that two different event loops are trying to manage the same socket and, amongst other things, close the file descriptor when they are done with it, unaware of the other event loop also handling the fd.

Unlike with child processes, transferring handles such as network sockets is currently not supported.

Was the NodeJS team planning on implementing this feature similar to how you can send a handle using subprocess.send ?

It’s on my long-term TODO list, but it gets tricky when thinking about things like Windows support. At this point the best way to do this is probably to try and re-use the code we use for child processes, and use something like libuv/libuv#1498 to create a fake internal IPC channel of some sorts to pass handles within a single process.

Is there a way around it using C++ or an experimental feature?

Not that I am aware of.

That would make a lot of sense. Since the fd is part of two different event loops, once it has to be destroyed, it's being closed and gc'ed twice.

I wonder if there would be a way to create a weak fd like you would create a weak reference? Let's say you create a new WeakSocket({ fd : some_file_descriptor }), the fd would never be disposed unless the strong reference is destroyed. That way the object could be gc'ed without the fd being disposed of, making it possible to reuse the same fd at multiple places until it is closed. This would also allow to keep raw file descriptors without it being associated with any JS stream, yet.

At the moment, once a file descriptor is associated with, let's say, a Socket, the newly created object becomes responsible for managing the entire life cycle including freeing the fd. This is absolutely necessary, but I wonder if we could somehow make that last part optional.

Something like this :

// master.js //
const someThread = new Worker("./thread.js");

const weakRef = new WeakSocket({ fd : 42 });
someThread.postMessage({ duplex_fd : weakRef._handle.fd }); // file descriptor won't be disposed yet
// thread.js //
require('worker_threads').parentPort.on('message', (msg) => {
    const strongRef = new Socket({ fd : msg.duplex_fd })
    strongRef.end("\0"); // fd can now be disposed 
});

Mind if I play around with it and open a PR, or is it something you would like to avoid?

@rykdesjardins Where would you get that fd from in the main thread?

If you want to go the easy way and are mostly concerned about lifetime management, I guess you could work on a PR that makes libuv close its handle without closing the fd itself; basically, removing the internal state that libuv has built around it, and then allowing another thread to open that fd?

libuv/libuv#390 went a bit into that direction, and I think it would be the ideal way to tackle this, but … I always got kind of stumped when thinking about how one might support something like this on Windows. 🙁 I don’t know the API there well enough, but you’d still need to pass the HANDLE around, I guess?

If this ends up making its way into Node.js core, it should definitely come with Windows support, and it should handle situations gracefully in which the receiving thread goes away before it actually receives the fd/handle.