joyent/libuv

Invalid write when closing process handle after 1.0.0-rc1

Closed this issue · 7 comments

I've just tried testing neovim lua client with libuv 1.0.0 but got some memory errors that were not present in libuv 0.11.29. Here's the relevant valgrind stack strace:

==5783== Invalid write of size 8
==5783==    at 0x656EAC8: uv__process_close (process.c:515)
==5783==    by 0x656BCA4: uv_close (core.c:138)
==5783==    by 0x6569C00: walk_cb (loop.c:31)
==5783==    by 0x656B797: uv_walk (uv-common.c:267)
==5783==    by 0x6569EA9: loop_delete (loop.c:119)

It happens when the loop is about to be garbage collected by the lua runtime: https://github.com/neovim/lua-client/blob/master/nvim/loop.c#L115-L126.

This error can be reproduced on a 64-bit linux with these commands(all dependencies are automatically downloaded):

g clone -b nvim-nightly git://github.com/tarruda/lua-client
make valgrind

After make valgrind finishes, the report will be printed to the screen(might take more than one try to make it happen).
A gdb shell can be entered at the moment the invalid write happens like this:

$ make VALGRIND_OPTS="--log-file=valgrind.log --leak-check=yes --track-origins=yes --vgdb=yes --vgdb-error=1" valgrind

When the test pauses, open another terminal and enter:

$ gdb .deps/usr/bin/lua
(gdb) target remote | vgdb

I believe the problem was introduced in @2d5eaea1 (reverting that commit fixes the errors) though it could have been me misusing the API in a way that was only apparent after @2d5eaea1.

I believe the crash happens because at line 100 the QUEUE_INIT(q); statement is removed.
This leads to a crash in QUEUE_REMOVE later on (QUEUE_REMOVE is a no-op if q is an empty queue, which is why it didn't crash in the old situation).

I've created small test case that reproduces the error:

#include <stdlib.h>
#include <stdio.h>

#include <uv.h>

static void exit_cb(uv_process_t *proc, int64_t status, int term_signal)
{
  uv_stop(uv_default_loop());
}

int main() {
  uv_process_t proc;
  uv_process_options_t opts;
  char *argv[] = {"cat", "-", NULL};

  opts.file = argv[0];
  opts.args = argv;
  opts.stdio_count = 0;
  opts.flags = UV_PROCESS_WINDOWS_HIDE;
  opts.env = NULL;
  opts.cwd = NULL;
  opts.exit_cb = exit_cb;

  int status;
  if (status = uv_spawn(uv_default_loop(), &proc, &opts)) {
    const char *error = uv_strerror(status);
    fprintf(stderr, "%s\n", error);
    exit(1);
  }

  printf("Starting\n");
  uv_run(uv_default_loop(), UV_RUN_DEFAULT);
  uv_close((uv_handle_t *)&proc, NULL);
  while (uv_loop_close(uv_default_loop())) {
    uv_run(uv_default_loop(), UV_RUN_ONCE);
  }
  printf("Exited successfully\n");
}

Result of running the above program under valgrind(ubuntu 12.04 64 bit):

==4362== Invalid write of size 8
==4362==    at 0x407DB8: uv__process_close (process.c:515)
==4362==    by 0x404F94: uv_close (core.c:138)
==4362==    by 0x4037C5: main (invalid_write.c:33)
==4362==  Address 0xffeffc820 is not stack'd, malloc'd or (recently) free'd
==4362==
==4362== Invalid write of size 8
==4362==    at 0x407DC3: uv__process_close (process.c:515)
==4362==    by 0x404F94: uv_close (core.c:138)
==4362==    by 0x4037C5: main (invalid_write.c:33)
==4362==  Address 0xffeffc828 is not stack'd, malloc'd or (recently) free'd

Again, reverting @2d5eaea makes the error go away

I believe the crash happens because at line 100 the QUEUE_INIT(q); statement is removed.
This leads to a crash in QUEUE_REMOVE later on (QUEUE_REMOVE is a no-op if q is an empty queue, which is why it didn't crash in the old situation).

I need to further investigate, but IIRC I removed it because after removing the handle from one queue we add it to another, I'll check nevertheless.

@tarruda thanks, that really helps.

I can confirm that reverting the processing of the pending list in 2d5eaea, then problem goes away in Neovim: jszakmeister/libuv@80afe31.

I also tried adding in the QUEUE_INIT(q) bit--just blindly as I don't understand the significance of the line (jszakmeister/libuv@96b45e0)--and I end up with an assertion error while running the unit tests for Neovim:

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●luajit: /home/jszakmeister/projects/neovim/.deps/build/src/libuv/src/unix/core.c:95: uv_close: Assertion `!(handle->flags & (UV_CLOSING | UV_CLOSED))' failed.
CMake Error at /home/jszakmeister/projects/neovim/cmake/RunTests.cmake:23 (message):
  Running unit tests failed.


ninja: build stopped: subcommand failed.

BUILD FAILED

Allright, I think I caught it. @piscisaureus was right, when a process ends successfully we need to QUEUE_INIT the queue otherwise QUEUE_REMOVE will write those pointers when the process is closed.

@tarruda, can you test this patch?

diff --git a/src/unix/process.c b/src/unix/process.c
index 0aff5fd..be283b4 100644
--- a/src/unix/process.c
+++ b/src/unix/process.c
@@ -85,9 +85,14 @@ static void uv__chld(uv_signal_t* handle, int signum) {
     QUEUE_INSERT_TAIL(&pending, &process->queue);
   }

-  QUEUE_FOREACH(q, &pending) {
+  h = &pending;
+  q = QUEUE_HEAD(h);
+  while (q != h) {
     process = QUEUE_DATA(q, uv_process_t, queue);
-    QUEUE_REMOVE(q);
+    q = QUEUE_NEXT(q);
+
+    QUEUE_REMOVE(&process->queue);
+    QUEUE_INIT(&process->queue);
     uv__handle_stop(process);

     if (process->exit_cb == NULL)

Worked for me locally, but it would be great if yoiu could test it integrated in neovim.

Worked for me locally, but it would be great if yoiu could test it integrated in neovim.

👍 Worked nicely. When you push a new tag with this patch we'll point neovim bundled libuv to it, and it will be checked on every PR by our CI(we'll try to always use the latest libuv tag)

Thanks for confirming @tarruda! Landed in libuv/libuv@c0ea37c