Threading + MPI
Opened this issue · 6 comments
I've had this happen when running DFTK from within threads. I'm not too clear on what we should do here.
ERROR: LoadError: TaskFailedException
nested task error: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex
@ ./essentials.jl:892 [inlined]
[2] popfirst!
@ ./array.jl:1706 [inlined]
[3] run_init_hooks()
@ MPI ~/.julia/packages/MPI/rwDDn/src/environment.jl:65
[4] Init(; threadlevel::Symbol, finalize_atexit::Bool, errors_return::Bool)
@ MPI ~/.julia/packages/MPI/rwDDn/src/environment.jl:155
[5] Init
@ ~/.julia/packages/MPI/rwDDn/src/environment.jl:114 [inlined]
[6] PlaneWaveBasis(model::Model{…}, Ecut::Float64, fft_size::Tuple{…}, variational::Bool, kgrid::MonkhorstPack, symmetries_respect_rgrid::Bool, use_symmetries_for_kpoint_reduction::Bool, comm_kpts::MPI.Comm, architecture::DFTK.CPU)
@ DFTK ~/.julia/dev/DFTK/src/PlaneWaveBasis.jl:247
[7] #PlaneWaveBasis#141
@ ~/.julia/dev/DFTK/src/PlaneWaveBasis.jl:399 [inlined]
[8] setup_calculation(s::Int64, n_electrons::Int64, b::Int64, α::Int64; scaling::Symbol, α_q::Int64, α_r::Int64)
@ Main ~/Dropbox/recherche/2020-11-anyons/new/functions.jl:239
[9] setup_calculation
@ ~/Dropbox/recherche/2020-11-anyons/new/functions.jl:207 [inlined]
[10]
@ Main ~/Dropbox/recherche/2020-11-anyons/new/functions.jl:244
[11] macro expansion
@ ~/Dropbox/recherche/2020-11-anyons/new/compute.jl:25 [inlined]
[12] (::var"#33#threadsfor_fun#23"{Int64, Int64, String, Channel{Int64}})(tid::Int64)
@ Main ./threadingconstructs.jl:209
[13] (::Base.Threads.var"#1#2"{var"#33#threadsfor_fun#23"{Int64, Int64, String, Channel{Int64}}, Int64})()
@ Base.Threads ./threadingconstructs.jl:154
Some type information was truncated. Use `show(err)` to see complete types.
...and 5 more exceptions.
Stacktrace:
[1] threading_run(fun::var"#33#threadsfor_fun#23"{Int64, Int64, String, Channel{Int64}}, static::Bool)
@ Base.Threads ./threadingconstructs.jl:172
[2] macro expansion
@ ./threadingconstructs.jl:189 [inlined]
[3] top-level scope
@ ~/Dropbox/recherche/2020-11-anyons/new/compute.jl:21
I remember being able to do launch it in a quick and dirty way, but I am not so sure anymore…
On a local branch I enabled switching off the three parts where Threads is used.
It works most of the times but I just had this happen once. Switching off you mean this? #972
Right now, for me it works none of the time on another stuff I am doing…
Yes, I was indeed looking at 972 and looks like a lot what I am using for parallel phonons.
(I think I gave up looking at how to do thread in thread because of the @timing
stuff.)
(I think I gave up looking at how to do thread in thread because of the @timing stuff.)
Yeah, should we just disable this by default?
I have never used the fact that it's enabled by default. I've always found this surprising.
I've had this happen when running DFTK from within threads.
I think this is because MPI is initialised twice. We should put the initialisation call around a semaphore or signal MPI in the way we initialise it that it could be called from multiple threads (I think it has a flag to do that).