Z3Prover/z3

proposal: reduce concurrency for ocaml bindings

c-cube opened this issue · 12 comments

We've long had issues with Z3 and the OCaml bindings at Imandra, and had to avoid threads entirely because of that.

cc @giltho @XVilka @ivg

Problem description

I finally made a small repro for the issue (see below), which can occur as soon as there are at least 2 OCaml threads, one of which uses Z3. My understanding of the problem is roughly this:

  • thread A calls Z3, creates Z3 ASTs, expressions, etc. The AST nodes are wrapped in OCaml values via custom blocks (a special kind of block that lives in the OCaml heap, and can contain C/C++ data).
  • thread B does other stuff.
  • at some point, A calls Z3.Solver.solve solver and releases the runtime lock (long call to C code, allows for some parallelism). At this point A's code pointer is within C++.
  • meanwhile, while A is in Z3's solving code, B calls the GC for unrelated reasons. It collects dead values, some of which happen to be dead OCaml references to AST nodes. The AST nodes themselves are still alive and in use in Z3, it's just the OCaml reference that is dead.
  • B calls the finalizer on an OCaml wrapper of an AST node. This finalizer decrements a refcount inside Z3.

problems:

  • the refcount is not atomic, so it can become wrong
  • if refcount reaches 0, then thread B is going to release the AST node. It goes through code of ast.h/ast.cpp that is not thread-safe, while thread A might also be doing the same to allocate other unrelated AST nodes.

both of these are grounds for race conditions, even though only thread A ever used Z3 explicitly.

Basic fix

The simpler fix is to avoid releasing the runtime lock when calling a long-running Z3 function (simplify, solve, etc.) This means B cannot run at the same time as A.

However it's annoying because now the program can do absolutely nothing else why Z3 is being called. Even with several distinct, independent Z3 instances, only one can work at a given time.

Harder (potential) fix

I've given a quick try at making the deallocation path of AST nodes thread-safe. Unsurprisingly it didn't work; the question is, how much effort would be required to make it work, and is it worth it (and is it worth the maintainers' sanity).

The basic approach I started is to have refcounting use atomic<unsigned> for ast nodes (and in parts of the OCaml stubs); coupled with a mutex used, when a refcount reaches 0, to protect the deallocation path and everything that touches ast_manager::m_ast_table.

Repro

the following archive contains a small OCaml project.

reg_thread.tar.gz

$ tar xvf reg_thread.tar.gz
$ cd reg_thread
$ dune exec bin/main.exe
t1: solve                         
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t1: solve
t2: gc
t1: solve
fish: Job 1, 'dune exec bin/main.exe' terminated by signal SIGSEGV (Address boundary error)

Note that Java and C# API bindings put objects into a queue that gets flushed in the main thread where the dec-ref happens.
OCaml doesn't have this queue so far because the old GC wasn't concurrent.

Ah!! Very interesting. When is the queue flushed? Is it only after calls to solve/simplify/… ? Or after each operation perhaps?

See code I pushed. I call flush_objects() on context deletion, of course, and for inc_ref of ast.
Other places are possible, not sure they matter since there is no action without creating expressions
(you could create a lot of empty solvers)

I am pushing a change to the OCaml API to turn this feature on. It would be great if you can test it.

I'm a bit puzzled with this bug report. I thought you could not reuse a context between threads.
It seems that the reported usage violates this criteria. Can't the bug be fixed on the client code and/or the Ocaml plugin?

In the example, the context is not reused.
The issue is the garbage collector. In (non-multicore) ocaml, only one thread can run Ocaml code (the thread that holds the runtime lock). Typically, for long-running foreign calls (such as to Solver.solver), we'd release this lock. Then, another ocaml thread would run -- this thread, however, can invoke the garbage collector, which may free some z3-related objects, leading to the refcount race.
I guess technically the context is being accessed from two threads, but it's not the api user doing this, rather, it's the ocaml runtime.
In ocaml bindings for some other libraries I'm familiar with, this is managed by delaying the finalization. Instead of decreasing the refcount, the objects to be deleted in the incorrect thread are added into a queue, which is then processed in the correct thread (for example, when the solver returns).

Here's a concrete proposal for how to do this:

  1. Before calling caml_release_runtime_system, set a flag (does not need to be atomic, but does need to depend on the context)
  2. When a finalizer is run, and the flag is set, instead of decreasing the refcount, add the pointer to a global queue (in c)
  3. After calling caml_acquire_runtime_system, work through the queue to decrease the refcount, and reset the flag

The queue should be attached to the context, maybe we can put it into Z3_context_plus_data

EDIT:
Ah sorry I should have read the entire thread, this is very similar to the proposal by @NikolajBjorner, except I flush the queue on acquiring the runtime lock.

Maybe something like this: master...smuenzel:z3:ocaml-deletion-queue
Completely untested so far.

What I have pushed to github works with contexts migrating between threads. This is not an issue with this solution.
What I am asking for is whether you can test the solution. It should work unless I have missed something.
@nunoplopes - I hope it is clear given that you have looked at the new code: it lives inside of z3 and it allows dec-ref to be called from threads that are different from the thread where the operations on the context happen (operations on the context must be serialized, and if you call enable_concurrent_dec_ref() it will allow dec-ref operations to work from other threads).

I will test the branch, indeed. Starting with the regression test provided here :).

@smuenzel I had more or less the same idea before I tried to directly make dec_ref thread-safe. Indeed, a Z3_ast_vector (I think?) could be attached to Z3_context_plus_data and used only if some atomic flag is set.

I'm glad I submitted this issue instead of trying to solve it entirely on my own.

If my approach works, we can also simplify C# and Java APIs by using this setting. It simply pushes the functionality they implement inside of z3 so instead of having three different APIs do the same book-keeping it is handled inside (and simpler that the C# and Java versions).

Trying with https://github.com/c-cube/z3-ocaml-multithread-test .

On my older version of Z3 it generally crashes after a few seconds (at most). I'm going to keep running it for minutes with the current master (527914d) but so far it looks promising 🤞

After several tries, one of which lasted 45 minutes, no segfaults. I think this is definitely much better than the previous state of things, even if a corner case was still lurking in. Thank you very much @NikolajBjorner !!