Julia-Tempering/Pigeons.jl

Duplicated exec_folder for multi-threading calls

Opened this issue · 0 comments

I am often calling Pigeons.pigeons inside a Threads.@threads for loop to take advantage of extra computing resources I have. I am now trying to use the checkpoint feature, but I noticed that the way the exec_folder is set is not thread-safe: there is no check that the created folder is unique, as the name is just based (essentially) on the current time. As a result, if I start all the threads at the same time, I will end up having the same folder, with a lot of issues then on the saved checkpoints.

To have thread-safe folders, one could just replace the current code with

function next_exec_folder()
    formatted_time = Dates.format(now(), dateformat"yyyy-mm-dd-HH-MM-SS")
    result = mktempdir("results/all"; prefix=formatted_time, cleanup=false)
    _ensure_symlinked(result)
    return result
end

This is simpler and (seems to me) safer than the current code, and solves the issue. There is then still a problem with _ensure_symlinked(result), as that function would produce just one symbolic link, but I believe this is more complicated to solve.

https://github.com/Julia-Tempering/Pigeons.jl/blob/ab190b885272c66992d84102fbdfcf5ebb97c0d9/src/utils/exec_folder.jl#L11C14-L11C50