Can't mount more than 3 partitions at once.

Question

Can't mount more than 3 partitions at once.

Closed this issue 7 years ago · 5 comments

If you try to call lkl.mountAsync on more than 3 partitions simultaneously (using Promise.all), node-lkl hangs.

I discovered this issue while creating a reconfix branch that uses node-lkl: reconfix uses ava for its tests and ava runs tests in parallel. This is what triggered mounting more than 3 partitions at the same time.

I have made a cant-mount-4-partitions-test branch with a test case that makes node-lkl hang.
Simply comment one of these 4 lines and it will no longer hang.

Any idea of what might be happening @petrosagg ?

Answer 1 · 2017-03-15T12:10:46.000Z

getCapacity calls fs.fstat and its callback is never called .

If I replace getCapacity's code with something that directly calls the callback with a predefined size, the problem is reported to request that calls either fs.read or fs.write which never calls back.

Answer 2 · 2017-03-15T12:26:12.000Z

Replacing fs.read, fs.write and fs.fstat in request and getCapacity by their sync equivalents removes the problem (but we don't want to do that).

Answer 3 · 2017-03-15T23:09:44.000Z

@zvin what happens is that the threadpool of libuv gets in a deadlock. By default libuv starts 4 worker threads to fulfill background requests. In LKL when you call lkl.mount() a MountWorker instance is inserted in the pool and it will stay there until its IO requests are fulfilled and it can call its callback.

The problem is that the IO needs to use the threadpool too. So when you call lkl.mount() 4 times synchronously, there are no slots left for the IO workers to fulfill requests and so everyone deadlocks.

You can verify that this is the case by setting the env var UV_THREADPOOL_SIZE=5 and running the tests again.

I'm not sure what is the best way to fix this issue. It looks like we need to maintain some sort of queue in LKL with pending items instead of filling the threadpool with threads in a waiting state. I'll think about it more tomorrow but this is the general issue.

Answer 4 · 2017-03-16T18:41:10.000Z

Thanks @petrosagg !

We have another problem then: with UV_THREADPOOL_SIZE=1 we can't even mount one partition.

How I understand it so far:

we call mount
mount starts an AsyncQueueWorker
the AsyncQueueWorker calls our MountWorker in one of the threads of the libuv pool
the MountWorker calls lkl_disk_add
lkl calls js_get_capacity_entry
js_get_capacity_entry calls get_capacity (on the main thread)
get_capacity calls getCapacity
getCapacity calls fs.fstat which (requires a thread from the uv thread pool)
this is where we hang because the only thread of the thread pool is running lkl_disk_add which waits for fs.fstat to complete.

Answer 5 · 2017-04-06T15:23:35.000Z

fixed by #29