Shinmera/plump

Memory safety under multiple threads

Closed this issue · 3 comments

The issue I had using lquery in lispworks turned out to be rooted in plump - using explicit binding of the plump dom result from an lquery initialize didn't prevent the errors.

I think the same problem occurs in SBCL, just looks different and is more difficult to recover from. You might have to run (thash) several times or increase :num-threads to obtain the error.

https://gist.github.com/spacebat/860c5bb697017818474682564100559e

I've looked into plump and tried various things like changing all optimize declarations from speed over to safety, checked that all globals of note get dynamically bound, failed to find structure sharing or reuse in the string generation and manipulation. As with lquery, the only way I've found to use it safely is to put a lock around all use of the library (parsing, dom manipulation and serialization).

I can live with the lock in place, but I thought you might be interested in this report and I'm in turn interested in your thoughts.

Cheers

I ran your example under SBCL 1.3.6.20-1528344 in a loop for about half an hour without any crashes. The same for running the example with 200 threads. This result does not surprise me much, since I find it very hard to believe that there would be concurrency problems in this setup. Especially given that I have been running web services in production that make heavy use of Plump & co. for months without any crashes or incidents related to this.

Are you sure that your tests didn't drop you into LLDB due to heap exhaustion? That seems like the most likely scenario if you're running your tests in Slime without clearing the REPL, as Slime's REPL keeps references to the printed objects around.

Otherwise I unfortunately can't really do much to help with this as --just like you already discovered-- there doesn't seem to be anything obvious to Plump that could cause the problem, and without having more information about the circumstances or being able to reproduce it, there's very little I can do to investigate the issue.

You're quite right, when I increase dynamic space in SBCL and prevent the function from returning the dom trees, the function no longer exhausts the heap. The problem I'm seeing in Lispworks 6.1 is different:

<**> Stack scan : pointer 0x2 in address 0x... is out of known segments (nil), ...

Then it proceeds with a memory dump and backtrace, the most recent function on the stack is always (method serialize-object (element)), and yet it appears to only be a warning, as I seem to get a well formed DOM tree out regardless.

I'll close this issue since the bug if any may be in Lispworks 6.1, and I've more digging to do now thanks to your feedback. Much obliged.

Let me know if you can find out more.