soxbindings fails when multithreading

Question

soxbindings fails when multithreading

rabitt opened this issue 3 years ago · 21 comments

soxbinding works great in one thread, but it looks like it consistently fails when multithreading. Minimal example below. Note that any effect (.vol, .compand, .trim, etc) triggers the same error. Running this with command line sox (replacing import soxbindings as sox with import sox) works fine.

from multiprocessing.dummy import Pool as ThreadPool
import numpy as np
import soxbindings as sox

y1 = np.zeros((4000, 1))
y2 = np.zeros((3000, 1))


def do_transform(y):
    tfm = sox.Transformer()
    tfm.vol(0.5)
    y_out = tfm.build_array(input_array=y, sample_rate_in=1000)
    return y_out


# single thread
print("running single thread")
for y in [y1, y2]:
    res = do_transform(y)
    print(res.shape)

# multithread
print("running multi thread")
pool = ThreadPool(2)
results = pool.map(do_transform, [y1, y2])

Output:

running single thread
(4000, 1)
(3000, 1)
running multi thread
Assertion failed: (fft_len == -1), function init_fft_cache, file effects_i_dsp.c, line 170.
Abort trap: 6

Answer 1 · 2021-02-19T02:16:41.000Z

Thanks! I can look into this. Appreciate the concise report and repro instructions! FWIW, I've been using SoxBindings in a multi-process dataloader from PyTorch with no issues, so maybe try multiple processes as a quick fix for now.

Answer 2 · 2021-02-19T13:31:53.000Z

so maybe try multiple processes as a quick fix for now.

Unfortunately I don't think it's possible in my case - I'm using it inside a tensorflow dataloader which uses multithreading. I don't know of a way around it.

Answer 3 · 2021-02-25T21:06:39.000Z

@rabitt i guess one has to disable openmp when compiling sox -> ./configure --disable-openmp. No idea if there is anything that can be done from within bindings (after compile)...

the folks at torchaudio seemed to have the same problems: pytorch/audio#1026

@pseeth ?

Answer 4 · 2021-02-26T08:33:52.000Z

Thank you for that pointer @faroit! SoxBindings has a slightly different issue though - that one appears to because of a mismatch between PyTorch OpenMP and libsox OpenMP. But, good news, might have the beginnings of a fix due to the rabbit hole that led me down...

Thank you to this hero on the SoX forums.

Here's the gist, I made a context manager that I call build_flow_effects within: sox_context_manager. This context manager initializes SoX (sox_init) before doing the effects chain. It then shuts down sox (sox_quit) like this:

sox_init
build_flow_effects
...
...
...
...
sox_quit

So this works great when you're doing it in a single thread, but in a multithreaded setup, you end up with this very bad scenario:

thread 1                              thread 2
sox_init                                
build_flow_effects       
...                                       sox_init
...                                       build_flow_effects
...                                       ...
...                                       ...
sox_quit                            ...
                                         sox_quit

So the quits and inits happen interleaved which SoX really doesn't like. Like the person says in the forum:


You are initializing SoX twice.
Fixed your example by moving the sox_init() outside the loop.

Tested and working with "alsa" instead of "coreaudio".

Cheers,

-Pascal

Cheers indeed. I took the decorator off. You'll have to wrap your program function in the decorator to be threadsafe, or call initialize_sox and quit_sox at the beginning and end of your program, respectively. I'll have to figure out the best way to fix this so that single-threaded SoxBindings programs are not affected, as taking the decorator off will break things the other way. I added a test case to SoxBindings that looks at the multi-threaded case based on @rabitt's example code here.

Answer 5 · 2021-02-26T08:55:39.000Z

@pseeth thanks for looking into. Not sure if this fix would be possible to be run inside tensorflows tf.data (which is in graph mode) but its certainly a huge step forward!

Answer 6 · 2021-02-26T09:01:09.000Z

Hmm, not super familiar with tf.data. I'll have to take a look, but say you have a program with a main function that runs your experiment with augmentation. You should be able to just do (after some kinda fix has been deployed which takes the decorator off in SoxBindings):

from soxbindings import sox_context_manager

@sox_context_manager()
def main():
   # my great experiment goes here
   # powered by soxbindings
   # and tf.data.

if __name__ ==  "__main__":
  main()

But I'm not sure totally if that'll work, having not used tf.data. If you can point me to some code with tf.data, I can take a look and try to make sure the fix here works there too.

Answer 7 · 2021-02-26T11:23:54.000Z

But I'm not sure totally if that'll work, having not used tf.data. If you can point me to some code with tf.data, I can take a look and try to make sure the fix here works there too.

Let me see if I can cook up a minimal tf.data example for you

Answer 8 · 2021-02-26T11:48:07.000Z

I think this does it -

import numpy as np
import tensorflow as tf
import soxbindings as sox


def do_transform(y):
    tfm = sox.Transformer()
    tfm.vol(0.5)
    y_out = tfm.build_array(input_array=y, sample_rate_in=1000)
    y_out = tf.cast(y_out, tf.float32)
    return y_out


def transform_in_graph(y):
    return tf.numpy_function(do_transform, [y], tf.float32)


def random_noise_generator():
    for _ in range(50):
        yield np.random.uniform(size=(4000, 1))


ds = tf.data.Dataset.from_generator(
    random_noise_generator, output_types=tf.float32, output_shapes=(4000, 1)
)
ds = ds.map(transform_in_graph, num_parallel_calls=4)  # change this to 1, it succeeds
for y in iter(ds):
    print(y.shape)

Obviously in this example there's an easy workaround (num_parallel_calls=1) but when training you often need two instances of a tf.data.Dataset (for train/test) and these appear to run in multiple threads.

Answer 9 · 2021-02-26T18:30:31.000Z

With the changes in #5, this snippet works:

import numpy as np
import tensorflow as tf
import soxbindings as sox


def do_transform(y):
    tfm = sox.Transformer()
    tfm.vol(0.5)
    y_out = tfm.build_array(input_array=y, sample_rate_in=1000)
    y_out = tf.cast(y_out, tf.float32)
    return y_out


def transform_in_graph(y):
    return tf.numpy_function(do_transform, [y], tf.float32)


def random_noise_generator():
    for _ in range(50):
        yield np.random.uniform(size=(4000, 1))


ds = tf.data.Dataset.from_generator(
    random_noise_generator, output_types=tf.float32, output_shapes=(4000, 1)
)
ds = ds.map(transform_in_graph, num_parallel_calls=4)  # change this to 1, it succeeds

with sox.sox_context_manager(): # <- THE FIX
    for y in iter(ds):
        print(y.shape)

I'll work on getting it released ASAP! Thanks all for the snippets and pointers!

Answer 10 · 2021-02-27T04:40:15.000Z

@faroit, @rabitt would you mind trying these steps to see if your SoxBindings related code works?

Install the branch with the fix:

pip install -U git+https://github.com/pseeth/soxbindings.git@multithread-fix
Modify your multi-threaded code using the context manager. See this part of the README for what to do.

Hopefully it works! Let me know, and then I'll merge PR #5 and release it as soxbindings==1.2.3.

Answer 11 · 2021-03-08T11:42:58.000Z

@pseeth thanks a lot, the fix works fine and can be merged as is! Unfortuntately, it still seems that the interface significantly slows down tf.data pipeline and real multiprocessing can't be utilized even if the number of parallel calls is set to a value higher than 1...

Answer 12 · 2021-03-08T18:06:02.000Z

@pseeth (sorry for the delay) I can also confirm it's working in my setup!

it still seems that the interface significantly slows down tf.data pipeline and real multiprocessing can't be utilized even if the number of parallel calls is set to a value higher than 1...

@faroit you're totally right, though it's not soxbinding's fault. I had an offline discussion with @psobot and he looked into it a bit - the sox C library itself can't do real multithreading. Still, 1x soxbindings is ~10x faster than 1x sox!

Answer 13 · 2021-03-08T19:14:54.000Z

the sox C library itself can't do real multithreading.

Just to clarify here - the limiting factor seems to be that soxbindings doesn't release Python's global interpreter lock (GIL), meaning that even if the underlying Sox code is thread-safe (unsure), independent Python threads are prevented from executing the Sox code in parallel due to the GIL.

Answer 14 · 2021-03-08T19:49:39.000Z

Hmm, okay. I'll merge this fix in sometime today - do you have any suggestions for how to speed things up @psobot? I very much appreciate the advice! SoX should be threadsafe: https://sox-users.narkive.com/m1PQmcwp/is-sox-threadsafe, so I imagine it's possible. I'm not sure if I did anything too crazy with my bindings though...everything in the execution of the build function (which calls soxbindings.sox) shouldn't touch global state or anything like that. If something is locking, it might be in the bindings. I just came across this: https://stackoverflow.com/questions/60915627/is-pybind11-pyarray-object-thread-safe. And also this: https://stackoverflow.com/questions/47309688/how-to-use-pybind11-in-multithreaded-application. This would be unfortunate, but might be fixable at some point.

Edit: this might be what we need: https://docs.python.org/3/c-api/init.html#releasing-the-gil-from-extension-code. I'll try it out at some point. Merging the related PR for now, though.

Answer 15 · 2021-03-09T16:14:18.000Z

Happy to help, @pseeth! I see you edited with the correct link - that should do it. If SoX is threadsafe under the hood, then releasing the GIL before calling SoX (and re-acquiring it afterwards before returning to Python code) should be all you need.

Answer 16 · 2021-03-19T16:47:58.000Z

@pseeth did you had some time to try this out? I can certainly try to help out....

Answer 17 · 2021-03-19T18:46:46.000Z

Unfortunately not yet. Feels like it should just be putting those two lines to release the GIL somewhere in the C extension inside SoxBindings, though, right? Definitely would promptly CR if you made a PR, though!

Answer 18 · 2021-03-30T18:38:02.000Z

Sorry for the delay - 1.2.3 is now out on pip! Closing this issue for now. Please re-open if you run into any issues!

Answer 19 · 2021-03-31T09:24:01.000Z

Sorry for the delay - 1.2.3 is now out on pip!

thats great! thanks

Closing this issue for now. Please re-open if you run into any issues!

i think we should keep this open until 👇 is addressed or rename the issue or create a new one

Unfortunately not yet. Feels like it should just be putting those two lines to release the GIL somewhere in the C extension inside SoxBindings, though, right? Definitely would promptly CR if you made a PR, though!

Answer 20 · 2021-03-31T09:26:01.000Z

Let's make a new issue. This issue was originally about avoiding a show-stopping error which I can say is solved (and has a different solution that should stay documented in this issue). I'll make a new one about avoiding GIL.

Answer 21 · 2021-03-31T09:29:37.000Z

sounds good. I will try to look into soon, but by c++ skills are a bit rusty ;-) @psobot ?