angr/claripy

z3 may generate internal constraints from incremental solves - these get lost during serialization

Opened this issue · 12 comments

stef commented

Howdy, and thank you very much for this awesome piece of work!

I'm building a model, which takes about 50 seconds with a script. I was wondering if it is possible to serialize the solver object and later reload it, and thus shave of most of the static pre-computation time. I tried a naive:

        solver = pickle.loads(pickle.dumps(solver, -1))

but not only does it take a lot of time, it also fails to compute the correct solution. Without this line everything works fine. Any help appreciated.

and thank you again for your hard and awesome work on claripy (and angr)

Solvers have lots of unserializable internal data, but you should be able to serialize a set of constraints and reload those.

pickle.loads(pickle.dumps(a == 12))
<Bool a_0_32 == 0xc>

stef commented

thanks!
i wonder if i can do

pickle.loads(pickle.dumps(solver.constraints))

Yes, you should be able to. solver.constraints is just a python list of AST constraints.

stef commented

strange, if i insert this code into mine right before i call s.eval() it just makes my fan go loud:

        import pickle
        c = pickle.dumps(s.constraints)
        s = Solver()
        s.add(pickle.loads(c))

if i remove those lines, s.eval() works. i checked, the only thing i do before is:

s = claripy.Solver()
...
...
s.add(...)
s.add(...)
assert s.solution(testvalues...)

Can you insert a print between pickle.loads and s.add of the result? My guess is that serializing and deserializing just takes unreasonably long.

stef commented

I did this:

        print(s.constraints, file=sys.stderr)
        print("asdfqwer", file=sys.stderr)
        import pickle
        start = time.time()
        print(f"[t] 00:00:00 serializing constraints")
        c = pickle.dumps(s.constraints)
        delta = datetime.timedelta(seconds=time.time() - start)
        print(f"[t] {str(delta)} serialized constraints")
        s = Solver()
        s.add(pickle.loads(c))
        print(s.constraints, file=sys.stderr)
        delta = datetime.timedelta(seconds=time.time() - start)
        print(f"[t] {str(delta)} deserialized constraints")
        try:
            sol = s.eval(..., 1, extra_constraints=[...])
        except:
            delta = datetime.timedelta(seconds=time.time() - start)
            print(f"[t] {str(delta)} unsat exception")
            raise

this is the output:

[t] 00:00:00 serializing constraints
[t] 0:00:00.026087 serialized constraints
[t] 0:00:12.140778 deserialized constraints
[t] 0:05:13.386825 unsat exception

i wrote the constraints into stderr, separated by this fancy delimiter, manually i split the file into pre- and post pickling, and according to diff(1) the two files are identical. also it turns out that both files are about 25 megabytes big...

so the fact that you're getting an exception after exactly 5 minutes means that we're getting a timeout result back from z3. that's concerning, I don't know how serialization could affect what we're sending down to z3. I don't really know how to help you debug this other than to push you in the direction of backend_z3.py and suggest you to instrument the interface between claripy and z3 to try to detect if there are any major differences in what happens in the two cases.

stef commented

would you have a look if i shared the script with you?

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

I finally had some time to look into this. It turns out the issue is nothing to do with serialization and everything to do with the fact that z3 will generate intermediate assertions because you're doing incremental sanity checks throughout your script. These assertions never make it back out of z3 and into claripy, so when you discard the old solver and add the constraints to a new one, z3 has to redo all its work from scratch, which seems to be significantly harder than the incremental approach.

I'm not sure what the right approach is to solve this, but it's certainly necessary. I think it's necessary to pull extra constraints that z3 generates into python land, but when? how to handle annotations? should they be treated as equals to user-added constraints?

stef commented

interesting to know that z3 adds these incremental assertions due to my sanity checks. how about pulling the z3 extra asserts back into python land when the dev explicitly calls a hey_caripy_z3_might_know_more_than_you_think_get_the_extra_from_z3() function? or maybe just check upon serialization if claripy and z3 agree on the model?

stef commented

btw for anyone curious the script that is the testcase for this can be found here: https://github.com/stef/px1000cr/blob/master/px1k-claripy.py