Race-condition when using Metatheory from multiple threads
Opened this issue · 3 comments
I'm getting errors when calling saturate for an egraph from multiple threads.
I'm careful not to use any shared variables. The race condition seems to occur inside Metatheory (ale/3.0 branch).
Here is the test script
using Metatheory
function test_threads()
println("thread pool size: $(Threads.threadpoolsize())")
Threads.@threads for i in 1:1000
theory = @theory a b c begin
a + b == b + a
a * b == b * a
a + (b + c) == (a + b) + c
a * (b * c) == (a * b) * c
end
g = EGraph{Expr}(:(1 + (2 + (3 + (4 + (5 + 6))))));
params = SaturationParams(timeout=100)
report = saturate!(g, theory, params)
# println(g)
# println(report)
end
end
Calling this from a julia instance with 12 threads julia -t 12
: leads to the following output (example)
julia> test_threads()
thread pool size: 12
ERROR: TaskFailedException
nested task error: WARNING: both TermInterface and Meta export "isexpr"; uses of it in module Metatheory must be qualified
BoundsError: attempt to access 81-element Vector{UInt64} at index [247]
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] getindex
@ ./abstractarray.jl:1291 [inlined]
[3] find
@ ~/.julia/dev/Metatheory/src/EGraphs/unionfind.jl:21 [inlined]
[4] find
@ ~/.julia/dev/Metatheory/src/EGraphs/egraph.jl:226 [inlined]
[5] canonicalize!(g::EGraph{Expr, Nothing}, n::Vector{UInt64})
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/egraph.jl:250
[6] add!(g::EGraph{Expr, Nothing}, n::Vector{UInt64}, should_copy::Bool)
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/egraph.jl:279
[7] instantiate_enode!(bindings::SubArray{UInt128, 1, Vector{…}, Tuple{…}, true}, g::EGraph{Expr, Nothing}, p::PatExpr)
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/saturation.jl:133
[8] apply_rule!(bindings::SubArray{…}, g::EGraph{…}, rule::EqualityRule, id::UInt64, direction::Int64, merges_buffer::OptBuffer{…})
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/saturation.jl:145
[9] eqsat_apply!(g::EGraph{…}, theory::Vector{…}, rep::Metatheory.EGraphs.SaturationReport, params::SaturationParams, ematch_buffer::OptBuffer{…}, merges_buffer::OptBuffer{…})
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/saturation.jl:241
[10] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:237 [inlined]
[11] eqsat_step!(g::EGraph{…}, theory::Vector{…}, curr_iter::Int64, scheduler::Metatheory.EGraphs.Schedulers.BackoffScheduler, params::SaturationParams, report::Metatheory.EGraphs.SaturationReport, ematch_buffer::OptBuffer{…}, merges_buffer::OptBuffer{…})
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/saturation.jl:299
[12] saturate!(g::EGraph{Expr, Nothing}, theory::Vector{RewriteRule}, params::SaturationParams)
@ Metatheory.EGraphs ~/.julia/dev/Metatheory/src/EGraphs/saturation.jl:336
[13] macro expansion
@ ~/cuda/test_parallel_eqsat.jl:17 [inlined]
[14] (::var"#10#threadsfor_fun#2"{var"#10#threadsfor_fun#1#3"{UnitRange{Int64}}})(tid::Int64; onethread::Bool)
@ Main ./threadingconstructs.jl:214
[15] #10#threadsfor_fun
@ Main ./threadingconstructs.jl:181 [inlined]
[16] (::Base.Threads.var"#1#2"{var"#10#threadsfor_fun#2"{var"#10#threadsfor_fun#1#3"{UnitRange{Int64}}}, Int64})()
@ Base.Threads ./threadingconstructs.jl:153
...and 10 more exceptions.
Stacktrace:
[1] threading_run(fun::var"#10#threadsfor_fun#2"{var"#10#threadsfor_fun#1#3"{UnitRange{Int64}}}, static::Bool)
@ Base.Threads ./threadingconstructs.jl:171
[2] macro expansion
@ Main ./threadingconstructs.jl:219 [inlined]
[3] test_threads()
@ Main ~/cuda/test_parallel_eqsat.jl:7
[4] top-level scope
@ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.
With a single thread there is no error.
What branch are you on? Are you on latest MT ale/3.0
? We stopped using global buffers. The thing is, if you saturate a different e-graph object per-thread, this shouldn't happen and it's indeed a bug.
If you instead are trying to call saturate!
on the same e-graph but from different threads, there's no mechanism for parallel saturation, and it's a hard task. Searching is fine, but applying matches must happen sequentially and after you union e-classes, ids are going to change, so if you change the ordering you also change the behavior.
I pulled the latest changes from ale/3.0
but the problem persists. Am I missing something?
(@v1.10) pkg> status Metatheory
Status `~/.julia/environments/v1.10/Project.toml`
[e9d8d322] Metatheory v3.0.0 `~/.julia/dev/Metatheory`
shell> cd ~/.julia/dev/Metatheory/
/home/gkronber/.julia/dev/Metatheory
shell> git status
On branch ale/3.0
Your branch is up to date with 'origin/ale/3.0'.
nothing to commit, working tree clean
shell> git log
commit 7709c902afbfd880405ec0c0916eee95c2e6d662 (HEAD -> ale/3.0, origin/ale/3.0)
Author: Alessandro Cheli <a@a.a>
Date: Fri Jun 28 12:14:14 2024 +0200
remove merges buffer
Yes, I only need the first case: saturate independent egraph objects on parallel threads.
I pulled the latest changes from
ale/3.0
but the problem persists. Am I missing something?(@v1.10) pkg> status Metatheory Status `~/.julia/environments/v1.10/Project.toml` [e9d8d322] Metatheory v3.0.0 `~/.julia/dev/Metatheory` shell> cd ~/.julia/dev/Metatheory/ /home/gkronber/.julia/dev/Metatheory shell> git status On branch ale/3.0 Your branch is up to date with 'origin/ale/3.0'. nothing to commit, working tree clean shell> git log commit 7709c902afbfd880405ec0c0916eee95c2e6d662 (HEAD -> ale/3.0, origin/ale/3.0) Author: Alessandro Cheli <a@a.a> Date: Fri Jun 28 12:14:14 2024 +0200 remove merges buffer
Yes, I only need the first case: saturate independent egraph objects on parallel threads.
I will investigate further. I think it's because each rule has its own custom callstack