bgribble/mfp

Busy patches will eventually lock up

Closed this issue · 2 comments

This is a good one.

I created a pretty large performance patch with a [mix8bus 8] and several of each of the utility patches from the bgribble/mfp-patches repo. It worked fine for about 15 minutes, then locked up.

Turns out after some probing that at some point we get a rapid explosion of new threads, due to a logjam of RPC requests that never complete.

A few learnings from the debugging so far:

  • In a significant working patch, the bulk of the RPC traffic appears to be dsp_response objects from [snap~] objects used to feed level meters. There are 18 meters in my performance patch, about 10Hz updates for each, so 180 requests/sec just to show meters.

I've found 2 problems so far:

  • A memory leak in RPCHost where Request objects, once added to rpc_host.pending, are never removed. That's pretty bad and would eventually kill a long-running program but I don't think it's the immediate problem

  • A race (I believe) in Processor._send which can cause a single processor to deadlock. This cascades to lock up the whole patch since every incoming dsp_response is queued up by a separate worker so we just use up the whole pool waiting for one processor

There were a couple of races happening. The killer was in the Request constructor, where a sequential ID was assigned to the requests but access/increment was not reentrant :(

There's still a small memory leak in mfpdsp somewhere, but this commit fixes the problems that this ticket is describing.