Index of current worker in parallelized MC

Question

Index of current worker in parallelized MC

astronasko opened this issue 3 years ago · 3 comments

astronasko commented 3 years ago

General information:

emcee version: 3.0.2.
platform: Devuan GNU/Linux 2.1
installation method (pip/conda/source/other?): conda

Problem description:

Greetings,

From the Parallelization page, MCMC sampling should be parallelized in the following manner:

from multiprocessing import Pool

with Pool(n) as pool:
    sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, args, pool=pool)

Is it possible to get the index of the current worker from the pool such that it can be used in args?

Answer 1 · 2021-07-28T16:04:28.000Z

This is not supported, but can you say a little more about what you'd use it for? One option is to use the process ID which will be unique across threads.

Answer 2 · 2021-07-28T16:19:34.000Z

Thank you for your reply! My log_prob function uses third-party software to generate files which are then read and processed within. I cannot change nor predict the names of these files. Thus if they all are generated in the same directory, there is no guarantee that the right file goes to the right worker.

I can however change the directory of file generation, so I was thinking of a way to "isolate" every thread in a separate subdirectory to escape race condition. I will try with os.getpid() and report back, thank you again!

Answer 3 · 2021-08-09T11:19:37.000Z

Hello, thank you for your patience. I ended up generating a random string as an identifier for each walker-step pair.