noxdafox/pebble

Channel mutex timeout

melsabagh opened this issue · 6 comments

I have been experiencing abnormal worker terminations on a lengthy workload where the completion time of a task in the workload is highly variable. I narrowed it down to the channel mutex timeout. I could get the workload to finish as expected by increasing the LOCK_TIMEOUT in the following:

LOCK_TIMEOUT = 60

Is this expected to be part of the stable API? Is there a better way to adjust this timeout?

This timeout is not expected to be changed as long as it's not really a corner case.

How large data are you sending to the workers or retrieving back as results?

In this particular case, each worker was handling about 500 tasks with a per task completion time ranging from <1 second to 4 minutes.

It is not a matter of how much a worker is churning but how much data is been transferred back-and-forth between the main process and the workers.

The channel mutex timeout error can happen in 3 situations:

  • The worker holding control over the channel was abruptly killed (ex: kill -9 or OOM killer). The channel will be stuck forever.
  • The machine is so busy that there is not even the time to transfer the data on the channel. You are, for example, using way more workers than the cores your PC has.
  • There is so much data being delivered to/from the main process and the workers that it takes more than 60 second to transfer a single work unit (the function arguments or its returned value).

The channel mutex timeout is designed exactly to avoid deadlocking in the first case. This is not your case obviously as increasing the timeout allowed your work to go through.

In the second case, you are overcommitting so much that you would notice the OS struggling which I doubt.

You are most likely falling in the third case. As recommended in the multiprocessing programming guidelines one should avoid moving large amount of data between processes and use other mechanisms such as files (which can be mapped in memory if I/O becomes a concern). If the data you are shifting is large, I would suggest to write it into a files and send their paths to/from the workers. You will notice improved performance as well as you are not trying to cram a lot of data into a 64Kb buffer (Linux default pipe size) designed for sequential reading.

The data depends on the input. I can see a file being faster for large amount of data, but it would likely be a lot slower than a pipe if the data fits within the pipe capacity (or a few multiples). Sounds like I will need a per-task strategy based on the input characteristics. Thanks for the hint.

It might still be faster if you store it in ramdisk. That said, indeed a logic choosing whether to ship the data via the pipe or other means would work.

Keep in mind that all workers share the same pipe (way faster implementation). Hence, if you stuff the pipe, all other workers end up starving.

This is why it's faster to use the pool's pipe as control plane and use something else as data plane if your problem deals with large chunks of it.

Closing this issue, please re-open if the problem persists.